{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f7554354",
   "metadata": {},
   "source": [
    "# RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval\n",
    "\n",
    "[RAPTOR](https://arxiv.org/pdf/2401.18059.pdf) 논문은 문서의 색인 생성 및 검색에 대한 흥미로운 접근 방식을 제시합니다.\n",
    "\n",
    "[테디노트 논문 요약글(노션)](https://teddylee777.notion.site/RAPTOR-e835d306fc664dc2ad76191dee1cd859?pvs=4)\n",
    "\n",
    "- `leafs` 는 가장 low-level 의 시작 문서 집합입니다. 이 문서들은 임베딩되어 클러스터링됩니다.\n",
    "- 그런 다음 클러스터는 유사한 문서들 간의 정보를 더 높은 수준(더 추상적인)으로 요약합니다.\n",
    "\n",
    "이 과정은 재귀적으로 수행되어, 원본 문서(`leafs`)에서 더 추상적인 요약으로 이어지는 \"트리\"를 형성합니다.\n",
    "\n",
    "`leafs`는 다음과 같은 문서들로 구성될 수 있습니다.\n",
    "\n",
    "- 단일 문서에서의 텍스트 청크(논문에서 보여준 것처럼)\n",
    "- 전체 문서(아래에서 보여주는 것처럼)\n",
    "\n",
    "이번 튜토리얼에서는 LangChain 의 LCEL 문서에 이를 적용해 보도록 하겠습니다. 소스코드 기반의 RAG 시스템을 구축할 때 RAPTOR 방법론을 적용하는 방법에 대해서 다룹니다.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d65e33c",
   "metadata": {},
   "source": [
    "## 환경 설정"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "45420489",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# API 키를 환경변수로 관리하기 위한 설정 파일\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# API 키 정보 로드\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "572c2efd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LangSmith 추적을 시작합니다.\n",
      "[프로젝트명]\n",
      "CH12-RAPTOR\n"
     ]
    }
   ],
   "source": [
    "# LangSmith 추적을 설정합니다. https://smith.langchain.com\n",
    "# !pip install -qU langchain-teddynote\n",
    "from langchain_teddynote import logging\n",
    "\n",
    "# 프로젝트 이름을 입력합니다.\n",
    "logging.langsmith(\"CH12-RAPTOR\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "792c9f09",
   "metadata": {},
   "source": [
    "## 데이터 전처리\n",
    "\n",
    "`doc`은 LCEL 문서의 고유한 웹 페이지입니다. context 는 2,000 토큰 미만에서 10,000 토큰 이상까지 다양합니다."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55ba7426",
   "metadata": {},
   "source": [
    "웹 문서에서 텍스트 데이터를 추출하고, 텍스트의 토큰 수를 계산하여 히스토그램으로 시각화합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1d3506db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAIjCAYAAAA0vUuxAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAP2FJREFUeJzt3X2cjXX+x/H33BmM4y6MoQYRIhHZdtKGJveP0m65adcSqVD9EpWo3FUmm0aK1JaGJKVaRe5CHokdqhFNhdwMMcxMw2DMnLn1/f0RZzvN8B3HmTmHeT0fj8+D872+13U+1/g2e957nXOdAElGAAAAAICzCvR1AwAAAADg7whOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AcJExxujVV1/1dRv4gwkTJsgYUybPtW7dOq1bt871uGPHjjLG6M477yyT54+Li1NSUlKZPBcA+AuCEwCUAWNMiapjx46+btUjd9xxh5YvX65ff/1Vubm5Sk5O1gcffKDOnTv7ujVJUkREhCZMmKDWrVuXaP6gQYPc/l2cTqeSk5O1cuVKPfzww6pSpYpP+ipL/twbAPhCsK8bAIDyYMCAAW6PBw4cqK5duxYZ3759e1m25RVvv/22Bg8erC1btig2NlYpKSmKiIjQX//6V33xxRe68cYbFR8f79Me69Wrp4kTJ2rfvn3atm1bifd75plnlJSUpJCQENWtW1edOnXSyy+/rFGjRun2229XYmKia+5zzz2nF154oUz66tq163k9jyfO1dt9992nwED+v1cA5QvBCQDKwIIFC9we//nPf1bXrl2LjF9sRo8ercGDB2v69OkaNWqU27YpU6ZowIABKigo8FF3F27FihVKSEhwPX7hhRfUuXNnffbZZ1qyZImuvvpq5eTkSJIKCwtVWFhYqv1UqlRJTqdT+fn5pfo8NhfzvykAXAhDURRFlW29+uqrxvz2gRhXVa5c2UybNs388ssvJicnx+zYscOMHj26yL7GGPPqq6+6jT311FOmsLDQPPTQQ66x7t27m/Xr15uTJ0+aEydOmM8++8y0aNHCbb+4uDiTmZlp6tWrZxYvXmwyMzNNWlqaefHFF01gYOA5z6FixYomPT3d/PTTT9a5Z6pRo0Zm0aJF5siRIyYrK8vEx8ebnj17us0ZNGiQMcaYBg0auI137NjRGGNMx44dXWPr1q0ziYmJ5uqrrzZffPGFycrKMgcPHjSPP/54kf3+aNCgQWft80wP7dq1K3b7k08+aYwxZujQoa6xCRMmFPk3vfXWW81XX31lMjIyTGZmptmxY4d5/vnnS9TXmXNr27at+fLLL01WVpaZPn26a9u6deuKnGPfvn3N888/bw4fPmxOnjxpPv30U3P55Ze79ZSUlGTi4uKKnNPvj2nrLS4uziQlJV3Q+u3du7dJTEw0OTk55ocffjDdunXz+X+XFEVR5yquswOAn1iyZIkeffRRrVy5UqNGjdLOnTs1bdo0xcbGnnO/Z599VpMnT9YDDzygmTNnSvrtrYHLli3TyZMnNWbMGD377LNq0aKFNmzYoAYNGrjtHxQUpFWrVunIkSN67LHH9OWXX+qxxx7T/ffff87nvemmm3TZZZfpvffe06lTp6znV6dOHf33v/9Vt27d9Nprr+mpp55SxYoVtWTJEt1xxx3W/c+mRo0aWrlypbZt26bRo0drx44d+te//qXu3btL+u3tj88884wk6Y033tCAAQM0YMAArV+/3uPnnD9/vqRzv2WuRYsW+uyzzxQaGqrx48dr9OjRWrJkiTp06FDivi677DKtWLFCW7du1ciRI91uCFGcp556Sr169dLUqVP1yiuvqEuXLlqzZo0qVqx4Xufnyc/sfNbvTTfdpNdee03vv/++nnjiCVWsWFEff/yxataseV59AkBZ83l6oyiKKm/1xytOt99+uzHGmHHjxrnNW7RokSksLDRXXnmla+z3V5xefPFFU1BQYAYOHOjaHhYWZo4ePWreeOMNt2PVqVPHZGRkuI3HxcUZY4x5+umn3eYmJCSYb7755pzn8PDDDxtjjOndu3eJzjk2NtYYY0yHDh3cet2zZ4/Zu3evCQgIMNL5X3EyxpgBAwa4xkJCQsyhQ4fMhx9+6Bpr166d9SrT78t2xUmSycjIMAkJCa7Hf7zi9MgjjxhjjLnsssvOeoxz9XXm3O6///5itxV3xenAgQOmSpUqrvG77rrLGGPMww8/7BoryRUnW29/vOJ0vus3JyfHbaxVq1bGGGMefPDBUv9vj6IoytPiihMA+IGePXuqoKBAr7zyitv4Sy+9pMDAQPXo0cNtPCAgQK+++qoeeeQRDRgwQO+8845rW5cuXVSjRg0tXLhQl112masKCwu1efPmYu909/rrr7s9/uqrr3TllVees+eqVatKkjIzM0t8jps3b9bGjRtdY1lZWfr3v/+tRo0aqUWLFiU6zh9lZmbq3XffdT3Oz8/X119/be3/Qp08eVIOh+Os248dOyZJ6t27twICAjx6jpycHMXFxZV4/jvvvKOTJ0+6Hn/00Uc6dOiQevbs6dHzl9T5rt81a9Zo7969rseJiYk6fvx4qf+bAcCFIDgBgB9o0KCBDh065PaiV/rfXfb++Pa6gQMH6qGHHtLDDz+s999/323bVVddJem37/pJT093q27duqlOnTpu851Op9LT093GMjIyrG+bOnHihCSdMzz88Rx37txZZPxs51hSBw8eLDKWkZGhGjVqeHS8kqpSpco5Q+MHH3ygDRs2aM6cOUpNTdXChQvVp0+f8wpRycnJ53UjiF27dhUZ2717txo2bFjiY3jifNfvL7/8UuQYZfFvBgAXgrvqAcBFaOPGjWrTpo0eeughLVq0SBkZGa5tZ24TPWDAAKWkpBTZ9493RPP0TnA7duyQJLVq1UqffvqpR8cojjnLl8gGBQUVO362/j29ylMS9evXV/Xq1bV79+6zzsnJydHNN9+szp07q1evXurevbv69++vtWvXqmvXriX6XJjT6fRm25LO/fMt7bsCnuGLfzMAuFBccQIAP7B//37Vq1evyBerNm/e3LX993bv3q2uXbuqXr16Wrlypdt+e/bskSSlpaVp7dq1RerLL7/0Ss8bNmzQ0aNHdffdd5foO33279+vZs2aFRn/4zmeCYHVq1d3m+fpFSnp7GHBU//85z8lSatWrbI+7xdffKHRo0erZcuWGjdunKKjo11vl/R2X2euNv5ekyZNtG/fPtfjjIyMIj9bqejP93x6O9/1CwAXI4ITAPiB5cuXKzg4WA899JDb+KOPPqpTp05pxYoVRfZJTExUz549dfXVV2vp0qWuO6etWrVKx48f17hx4xQcXPSNBbVq1fJKz06nU1OnTlWLFi00derUYuf84x//UPv27SX9do433HCD/vznP7u2V65cWffff7+SkpL0008/Sfpf8Lv55ptd8wIDA613+TuXrKwsSUXDmCc6d+6sZ555Rnv37j3n93AV97azrVu3SpJCQ0O93pf021s4fx9e7rrrLtWrV89t/ezZs0d//vOfFRIS4hrr1auXIiMj3Y51Pr15sn4B4GLDW/UAwA8sXbpUX3zxhZ5//nk1bNhQ27ZtU9euXXXHHXdo+vTpbh+k/73Nmzerd+/eWr58uT766CPdcccdyszM1PDhwzV//nxt2bJF77//vn799VdFRkaqV69e2rhxox5++GGv9P3iiy+qZcuWeuyxx9S5c2d99NFHSklJUd26dXXHHXfohhtuUFRUlKTfvjz27rvv1ooVK/TKK6/o6NGjGjRokBo1aqQ777zTdYXjp59+Unx8vGJiYlSzZk0dPXpU/fv3LzYEltSePXuUkZGhYcOGKTMzU1lZWdq8ebPblZji9OjRQ82bN1dwcLDCw8N1yy23qEuXLtq/f79uv/125ebmnnXf8ePH6+abb9ayZcu0f/9+1alTRyNGjNCBAwe0YcOGC+rrbI4ePaoNGzYoLi5O4eHhGjlypHbt2qU333zTNeett95Snz59tHLlSi1atEiNGzfWgAEDirzt8Hx683T9AsDFxue39qMoiipvVdwX4IaFhZmXXnrJHDx40OTm5pqdO3eW+Atwb7vtNpOXl2cWLlzouq13x44dzYoVK0xGRobJzs42u3btMm+//bZp27ata78zX4D7x+co7stcz1V/+9vfzMqVK016errJy8szycnJZuHChebmm292m3fmC3CPHj1qsrOzzaZNm4p8Ae6ZeZ9//rlxOp3m8OHD5rnnnjPR0dHF3o48MTGxyP7FfUHrbbfdZn744QeTl5dnvTX5mduRn5GTk2MOHTpkVq1aZR5++GG3W36f7WfWuXNns3jxYnPw4EGTk5NjDh48aBYsWGCaNGlSor7Odm5nthV3O/J+/fqZ559/3qSkpJisrCyzdOlSc8UVVxTZ/9FHHzUHDhwwTqfTfPXVV6Zt27ZFjnmu3or7+V7I+pXOfpt0iqIof6mA038BAAAAAJwFn3ECAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIBFufwC3Hr16ikzM9PXbQAAAADwMYfDoUOHDlnnlbvgVK9ePSUnJ/u6DQAAAAB+on79+tbwVO6C05krTfXr1+eqEwAAAFCOORwOJScnlygXlLvgdEZmZibBCQAAAECJcHMIAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAwqfB6S9/+YuWLFmi5ORkGWPUu3dv6z4dO3ZUQkKCcnJytGvXLg0aNKgMOgUAAABQnvk0OIWFhWnbtm168MEHSzS/YcOGWrZsmdatW6c2bdro5Zdf1ltvvaWuXbuWcqcAAAAAyrNgXz75ypUrtXLlyhLPHzZsmJKSkvTYY49Jknbs2KGbbrpJjz76qD7//PPSahMAAABAOefT4HS+oqKitGbNGrexVatW6eWXXz7rPhUqVFBoaKjrscPhkCRVqlRJBQUFpdInAAAAAP9XqVKlEs+9qIJT3bp1lZqa6jaWmpqqatWqqWLFisrJySmyz9ixYzVx4sQi4x9++KEKCwtLq9XzFKTS+6cokOQv5wmU5lqXWO8AAHjTpf8aNSgoqMRzL6rg5ImYmBjFxsa6HjscDiUnJ6tPnz7KzMz0YWe/FyHpAUmXefm4RyS9Iemwl48LeKq01rrEegcAwNsu/deoDoejyIWZs7moglNKSorCw8PdxsLDw3X8+PFirzZJUl5envLy8oqMO51OOZ3OUunz/OVIqi6pjpePW3j62P5ynkBprXWJ9Q4AgLdd+q9Rg4NLHocuqu9xio+PV3R0tNtYly5dFB8f76OOAAAAAJQHPr8deevWrdW6dWtJUqNGjdS6dWtdccUVkqQpU6Zo3rx5rvmvv/66rrzySk2dOlXNmjXT8OHD1bdvX02fPt0n/QMAAAAoH3wanK6//npt3bpVW7dulSRNnz5dW7du1eTJkyVJERERioyMdM3ft2+fevXqpS5dumjbtm0aPXq0hg4dyq3IAQAAAJQqn37G6csvv1RAQMBZtw8ePLjYfdq2bVuabQEAAACAm4vqM04AAAAA4AsEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYOHz4DRixAglJSXJ6XRq06ZNat++/TnnP/LII9qxY4eys7P1yy+/KDY2VqGhoWXULQAAAIDyyKfBqW/fvoqNjdWkSZPUtm1bbdu2TatWrVLt2rWLnX/33XfrhRde0KRJk3T11Vfr3nvvVb9+/TRlypQy7hwAAABAeeLT4DRq1Ci9+eabmjt3rrZv365hw4YpOztbQ4YMKXb+jTfeqI0bN2rhwoXav3+/Vq9erYULF+pPf/pTGXcOAAAAoDwJ9tUTh4SEqF27doqJiXGNGWO0Zs0aRUVFFbvPf//7Xw0YMEDt27fXN998o0aNGqlnz56aP3/+WZ+nQoUKbm/lczgckqRKlSqpoKDAS2dzoSpKCjpd3hR0+tiVvHxcwFOltdYl1jsAAN526b9GrVSp5D34LDjVqlVLwcHBSk1NdRtPTU1V8+bNi91n4cKFqlWrljZs2KCAgACFhIRo9uzZbuHrj8aOHauJEycWGf/www9VWFh4QefgPaGSGkqq4OXj5km6WlKul48LeKq01rrEegcAwNsu/deoQUElD4U+C06e6Nixo8aNG6cRI0Zo8+bNatKkiWbMmKGnn35azz33XLH7xMTEKDY21vXY4XAoOTlZffr0UWZmZlm1bhEhadzpP73psKQpp/8E/EFprXWJ9Q4AgLdd+q9RHQ5HkQs5Z+Oz4JSenq6CggKFh4e7jYeHhyslJaXYfZ599lnNnz9fc+bMkST98MMPCgsL07///W89//zzMsYU2ScvL095eXlFxp1Op5xOpxfOxBtyJBWeLm8qPH1sfzlPoLTWusR6BwDA2y7916jBwSWPQz67OUR+fr4SEhIUHR3tGgsICFB0dLTi4+OL3ady5co6deqU29iZt9sFBASUXrMAAAAAyjWfvlUvNjZW8+bN07fffquvv/5aI0eOVFhYmOLi4iRJ8+bNU3JyssaNGydJWrp0qUaNGqXvvvvO9Va9Z599VkuXLi0SqAAAAADAW3wanBYtWqTatWtr8uTJqlu3rrZu3aru3bsrLS1NkhQZGekWiJ577jkZY/Tcc8+pfv36+vXXX7V06VI99dRTvjoFAAAAAOVAgKSiHwy6hDkcDp04cUJVq1b1o5tD1JM0QaXzwbtJkg55+biAp0prrUusdwAAvO3Sf416PtnAp1+ACwAAAAAXA4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACw8HlwGjFihJKSkuR0OrVp0ya1b9/+nPOrVaummTNn6tChQ8rJydHOnTvVo0ePMuoWAAAAQHkU7Msn79u3r2JjYzVs2DBt3rxZI0eO1KpVq9SsWTP9+uuvReaHhIRo9erVSktL01133aXk5GQ1aNBAx44dK/vmAQAAAJQbPg1Oo0aN0ptvvqm5c+dKkoYNG6ZevXppyJAhmjp1apH5Q4YMUc2aNXXjjTeqoKBAkrR///6ybBkAAABAOeSz4BQSEqJ27dopJibGNWaM0Zo1axQVFVXsPrfffrvi4+M1a9Ys9e7dW7/++qvee+89TZ06VadOnSp2nwoVKig0NNT12OFwSJIqVarkCl++V1FS0OnypqDTx67k5eMCniqttS6x3gEA8LZL/zVqpUol78Gj4NSoUSMlJSV5sqtLrVq1FBwcrNTUVLfx1NRUNW/evNh9rrzySt1yyy1asGCBevbsqSZNmui1115TSEiIJk+eXOw+Y8eO1cSJE4uMf/jhhyosLLygc/CeUEkNJVXw8nHzJF0tKdfLxwU8VVprXWK9AwDgbZf+a9SgoJKHQo+C0+7du/Xll19qzpw5+uijj5SbWzYnHRgYqLS0NN1///06deqUtmzZovr16+vxxx8/a3CKiYlRbGys67HD4VBycrL69OmjzMzMMunbLkLSuNN/etNhSVNO/wn4g9Ja6xLrHQAAb7v0X6M6HI4iF3LOxqPg1LZtWw0ePFixsbGaOXOmPvjgA82ZM0fffPNNiY+Rnp6ugoIChYeHu42Hh4crJSWl2H0OHz6s/Px8t7flbd++XREREQoJCVF+fn6RffLy8pSXl1dk3Ol0yul0lrjf0pUjqfB0eVPh6WP7y3kCpbXWJdY7AADedum/Rg0OLnkc8uh25Nu2bdPIkSNVr149DRkyRBEREdqwYYMSExP16KOPqlatWtZj5OfnKyEhQdHR0a6xgIAARUdHKz4+vth9Nm7cqCZNmiggIMA11rRpUx06dKjY0AQAAAAA3nBB3+NUWFioxYsXq0+fPhozZoyaNGmiadOm6cCBA5o3b57q1q17zv1jY2N13333aeDAgWrevLlmz56tsLAwxcXFSZLmzZunKVOmuObPnj1bNWvW1IwZM3TVVVepZ8+eGjdunGbNmnUhpwEAAAAA53RBd9Vr166dhgwZov79+ysrK0vTpk3TnDlzdPnll2vChAn69NNPdcMNN5x1/0WLFql27dqaPHmy6tatq61bt6p79+5KS0uTJEVGRrq9Le/gwYPq1q2bpk+fru+//17JycmaMWNGsbcuBwAAAABvCZBkznenRx99VIMHD1azZs20fPlyvfXWW1q+fLmM+d+h6tevr3379ikkJMSb/V4wh8OhEydOqGrVqn50c4h6kiaodD54N0nSIS8fF/BUaa11ifUOAIC3XfqvUc8nG3h0xWn48OF6++23NXfu3LPeyCEtLU333nuvJ4cHAAAAAL/iUXBq2rSpdU5+fr7eeecdTw4PAAAAAH7Fo5tD3HPPPbrrrruKjN91110aOHDgBTcFAAAAAP7Eo+A0duxYpaenFxlPS0vTuHHjLrgpAAAAAPAnHgWnyMhIJSUlFRnfv3+/IiMjL7gpAAAAAPAnHgWntLQ0XXvttUXGW7durSNHjlxwUwAAAADgTzwKTgsXLtQrr7yiTp06KTAwUIGBgercubNmzJih999/39s9AgAAAIBPeXRXvWeeeUYNGzbU2rVrVVBQIEkKDAzUO++8w2ecAAAAAFxyPApO+fn56t+/v5555hm1bt1aTqdTiYmJ+uWXX7zdHwAAAAD4nEfB6Yxdu3Zp165d3uoFAAAAAPySR8EpMDBQ99xzj6Kjo1WnTh0FBrp/VCo6OtorzQEAAACAP/AoOM2YMUP33HOPli1bph9++EHGGG/3BQAAAAB+w6Pg1L9/f/Xt21crVqzwdj8AAAAA4Hc8uh15Xl6edu/e7e1eAAAAAMAveRScXnrpJT3yyCPe7gUAAAAA/JJHb9W76aab1LlzZ/Xo0UM//vij8vPz3bbfeeedXmkOAAAAAPyBR8Hp2LFjWrx4sbd7AQAAAAC/5FFwGjJkiLf7AAAAAAC/5dFnnCQpKChI0dHRuv/++1WlShVJUkREhMLCwrzWHAAAAAD4A4+uOEVGRmrlypWKjIxUaGioVq9erZMnT2rMmDEKDQ3V8OHDvd0nAAAAAPiMR1ecZsyYoW+//VY1atSQ0+l0jS9evFjR0dFeaw4AAAAA/IFHV5z+8pe/6MYbbyxyN719+/apfv36XmkMAAAAAPyFR1ecAgMDFRQUVGT88ssvV2Zm5gU3BQAAAAD+xKPg9Pnnn2vkyJGux8YYhYWFadKkSVq+fLm3egMAAAAAv+DRW/VGjx6tVatW6ccff1TFihX13nvv6aqrrlJ6erruvvtub/cIAAAAAD7lUXBKTk5W69at1b9/f1177bWqUqWK5syZowULFignJ8fbPQIAAACAT3kUnCSpsLBQCxYs0IIFC7zZDwAAAAD4HY+C0z//+c9zbp8/f75HzQAAAACAP/IoOM2YMcPtcUhIiCpXrqy8vDxlZ2cTnAAAAABcUjy6q17NmjXdyuFwqFmzZtqwYQM3hwAAAABwyfEoOBVn9+7devLJJ4tcjQIAAACAi53XgpMkFRQUqF69et48JAAAAAD4nEefcbrtttvcHgcEBCgiIkIPPfSQNm7c6JXGAAAAAMBfeBScPvnkE7fHxhj9+uuv+uKLLzR69Ghv9AUAAAAAfsOj4BQUFOTtPgAAAADAb3n1M04AAAAAcCny6IrTSy+9VOK5vHUPAAAAwMXOo+B03XXX6brrrlNISIh27twpSWratKkKCwu1ZcsW1zxjjHe6BAAAAAAf8ig4LV26VJmZmRo0aJCOHTsmSapevbri4uL01VdfKTY21ps9AgAAAIBPefQZp9GjR2vs2LGu0CRJx44d09NPP81b8wAAAABccjwKTlWrVlXt2rWLjNeuXVsOh+OCmwIAAAAAf+JRcFq8eLHi4uL017/+VfXr11f9+vX1t7/9TXPmzNF//vMfb/cIAAAAAD7l0Wechg0bpmnTpum9995TSEiIJKmgoEBz5szR448/7tUGAQAAAMDXPApOTqdTDz74oB5//HE1btxYkrRnzx5lZ2d7tTkAAAAA8AcX9AW4ERERioiI0K5duwhNAAAAAC5ZHgWnmjVras2aNfr555+1fPlyRURESJLmzJmjadOmebVBAAAAAPA1j4LT9OnTlZ+fr8jISLcrTR988IG6d+/uteYAAAAAwB949Bmnrl27qlu3bkpOTnYb37Vrlxo0aOCVxgAAAADAX3h0xSksLKzYzzTVrFlTubm5F9wUAAAAAPgTj4LTV199pYEDB7oeG2MUEBCgJ554QuvWrfNacwAAAADgDzx6q94TTzyhtWvX6vrrr1eFChX0r3/9Sy1btlTNmjXVoUMHb/cIAAAAAD7l0RWnH3/8UU2bNtWGDRv06aefKiwsTP/5z3903XXXae/evd7uEQAAAAB86ryvOAUHB2vlypUaNmyYpkyZUho9AQAAAIBfOe8rTgUFBbr22mtLoxcAAAAA8EsevVXv3Xff1b333uvtXgAAAADAL3l0c4jg4GANGTJEt956qxISEpSVleW2ffTo0V5pDgAAAAD8wXkFp0aNGmnfvn265pprtGXLFklS06ZN3eYYY7zXHQAAAAD4gfMKTrt27VJERIRuueUWSdL777+v//u//1NaWlqpNAcAAAAA/uC8PuMUEBDg9rhHjx4KCwvzakMAAAAA4G88ujnEGX8MUgAAAABwKTqv4GSMKfIZJj7TBAAAAOBSd16fcQoICNDcuXOVm5srSapYsaJef/31InfVu/POO73XIQAAAAD42HkFp3nz5rk9fvfdd73aDAAAAAD4o/MKTkOGDCmtPgAAAADAb13QzSEAAAAAoDwgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC78ITiNGjFBSUpKcTqc2bdqk9u3bl2i/fv36yRijxYsXl3KHAAAAAMoznwenvn37KjY2VpMmTVLbtm21bds2rVq1SrVr1z7nfg0aNNC0adO0fv36MuoUAAAAQHnl8+A0atQovfnmm5o7d662b9+uYcOGKTs7W0OGDDnrPoGBgVqwYIEmTJigvXv3lmG3AAAAAMqjYF8+eUhIiNq1a6eYmBjXmDFGa9asUVRU1Fn3Gz9+vNLS0vT222/rL3/5yzmfo0KFCgoNDXU9djgckqRKlSqpoKDgAs/AWypKCjpd3hR0+tiVvHxcwFOltdYl1jsAAN526b9GrVSp5D34NDjVqlVLwcHBSk1NdRtPTU1V8+bNi92nQ4cOuvfee9WmTZsSPcfYsWM1ceLEIuMffvihCgsLz7flUhIqqaGkCl4+bp6kqyXlevm4gKdKa61LrHcAALzt0n+NGhRU8lDo0+B0vqpUqaL58+frvvvu05EjR0q0T0xMjGJjY12PHQ6HkpOT1adPH2VmZpZWq+cpQtK4039602FJU07/CfiD0lrrEusdAABvu/RfozocjiIXcc7Gp8EpPT1dBQUFCg8PdxsPDw9XSkpKkfmNGzdWo0aNtHTpUtdYYOBvH9PKz89Xs2bNinzmKS8vT3l5eUWO5XQ65XQ6vXEaXpAjqfB0eVPh6WP7y3kCpbXWJdY7AADedum/Rg0OLnkc8unNIfLz85WQkKDo6GjXWEBAgKKjoxUfH19k/o4dO3TNNdeoTZs2rlqyZInWrVunNm3a6MCBA2XZPgAAAIBywudv1YuNjdW8efP07bff6uuvv9bIkSMVFhamuLg4SdK8efOUnJyscePGKTc3Vz/++KPb/seOHZOkIuMAAAAA4C0+D06LFi1S7dq1NXnyZNWtW1dbt25V9+7dlZaWJkmKjIzUqVOnfNwlAAAAgPLM58FJkmbNmqVZs2YVu61z587n3Hfw4MGl0RIAAAAAuPj8C3ABAAAAwN8RnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgIVfBKcRI0YoKSlJTqdTmzZtUvv27c86d+jQoVq/fr2OHj2qo0ePavXq1eecDwAAAAAXyufBqW/fvoqNjdWkSZPUtm1bbdu2TatWrVLt2rWLnd+pUyctXLhQnTt3VlRUlA4cOKDPP/9c9erVK+POAQAAAJQXPg9Oo0aN0ptvvqm5c+dq+/btGjZsmLKzszVkyJBi5w8YMECzZ8/Wtm3btHPnTg0dOlSBgYGKjo4u484BAAAAlBfBvnzykJAQtWvXTjExMa4xY4zWrFmjqKioEh2jcuXKCgkJ0dGjR4vdXqFCBYWGhroeOxwOSVKlSpVUUFBwAd17U0VJQafLm4JOH7uSl48LeKq01rrEegcAwNsu/deolSqVvAefBqdatWopODhYqampbuOpqalq3rx5iY4xdepUHTp0SGvWrCl2+9ixYzVx4sQi4x9++KEKCwvPu+fSESqpoaQKXj5unqSrJeV6+biAp0prrUusdwAAvO3Sf40aFFTyUOjT4HShxowZo/79+6tTp07KzS3+Bx8TE6PY2FjXY4fDoeTkZPXp00eZmZll1apFhKRxp//0psOSppz+E/AHpbXWJdY7AADedum/RnU4HEUu4pyNT4NTenq6CgoKFB4e7jYeHh6ulJSUc+47evRoPfnkk7r11luVmJh41nl5eXnKy8srMu50OuV0Oj1r3OtyJBWeLm8qPH1sfzlPoLTWusR6BwDA2y7916jBwSWPQz69OUR+fr4SEhLcbuwQEBCg6OhoxcfHn3W/xx9/XM8884y6d++uhISEsmgVAAAAQDnm87fqxcbGat68efr222/19ddfa+TIkQoLC1NcXJwkad68eUpOTta4ceMkSU888YQmT56sv//979q3b5/ratXJkyeVlZXls/MAAAAAcOnyeXBatGiRateurcmTJ6tu3braunWrunfvrrS0NElSZGSkTp065Zo/fPhwhYaG6uOPP3Y7zsSJEzVp0qQy7R0AAABA+eDz4CRJs2bN0qxZs4rd1rlzZ7fHjRo1KouWAAAAAMDF51+ACwAAAAD+juAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQAAAAAFgQnAAAAALAgOAEAAACABcEJAAAAACwITgAAAABgQXACAAAAAAuCEwAAAABYEJwAAAAAwILgBAAAAAAWBCcAAAAAsCA4AQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAs/CI4jRgxQklJSXI6ndq0aZPat29/zvl33XWXtm/fLqfTqe+//149evQoo04BAAAAlEc+D059+/ZVbGysJk2apLZt22rbtm1atWqVateuXez8qKgoLVy4UHPmzNF1112nTz75RJ988olatmxZxp0DAAAAKC98HpxGjRqlN998U3PnztX27ds1bNgwZWdna8iQIcXOf+SRR7Ry5UpNmzZNO3bs0Pjx47VlyxY99NBDZdw5AAAAgPIi2JdPHhISonbt2ikmJsY1ZozRmjVrFBUVVew+UVFRio2NdRtbtWqV7rjjjmLnV6hQQaGhoa7HDodDklS7dm1VqlTpAs/AW2pJypaU4eXjZku6TFKBl48LeKq01rrEegcAwNtK8zVqLfnD/2ZXqVKlxHN9Gpxq1aql4OBgpaamuo2npqaqefPmxe5Tt27dYufXrVu32Pljx47VxIkTi4zv2bPHs6YvOiN93QBQhkb6ugEAAFAiI33dgBuHw6HMzMxzzvFpcCoLMTExRa5Q1axZU0ePHvVRR7hUOBwOJScnq379+tb/0IDSwjqEr7EG4Q9Yh7gQDodDhw4dss7zaXBKT09XQUGBwsPD3cbDw8OVkpJS7D4pKSnnNT8vL095eXluY/wHBW/KzMxkTcHnWIfwNdYg/AHrEJ4o6Zrx6c0h8vPzlZCQoOjoaNdYQECAoqOjFR8fX+w+8fHxbvMlqUuXLmedDwAAAADeYHxZffv2NU6n0wwcONA0b97cvP766+bo0aOmTp06RpKZN2+emTJlimt+VFSUycvLM6NGjTLNmjUzEyZMMLm5uaZly5Y+PQ+q/JXD4TDGGONwOHzeC1V+i3VI+bpYg5Q/FOuQKqPyeQPmwQcfNPv27TM5OTlm06ZN5k9/+pNr27p160xcXJzb/Lvuusvs2LHD5OTkmMTERNOjRw+fnwNV/qpChQpmwoQJpkKFCj7vhSq/xTqkfF2sQcofinVIlUUFnP4LAAAAAOAsfP4FuAAAAADg7whOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnIDTxowZI2OMpk+f7hoLDQ3VzJkzlZ6erszMTH300UeqU6eO235XXHGFPvvsM2VlZSk1NVX/+te/FBQU5DanY8eOSkhIUE5Ojnbt2qVBgwaVyTnB/02YMEHGGLfavn27aztrEGWlXr16mj9/vtLT05Wdna3vv/9e7dq1c5szadIkHTp0SNnZ2Vq9erWaNGnitr1GjRp69913dfz4cWVkZOitt95SWFiY25xWrVpp/fr1cjqd+uWXX/T444+X+rnh4pCUlFTk96ExRjNnzpTE70P4B5/f2o+ifF3XX3+92bt3r9m6dauZPn26a/y1114z+/fvN507dzZt27Y1//3vf82GDRtc2wMDA833339vPv/8c9O6dWvTvXt3k5aWZp5//nnXnIYNG5qTJ0+aadOmmebNm5sHH3zQ5Ofnm65du/r8vCnf14QJE0xiYqIJDw931WWXXebazhqkyqKqV69ukpKSzNtvv23at29vGjZsaLp06WKuvPJK15wnnnjCZGRkmNtvv920atXKfPLJJ2bPnj0mNDTUNWf58uXmu+++M3/6059Mhw4dzM8//2wWLFjg2u5wOMzhw4fN/PnzTYsWLUy/fv1MVlaWue+++3z+M6B8X7Vq1XL7XRgdHW2MMaZjx45G4vch5Rfl8wYoyqcVFhZmdu7caaKjo826detcwalq1aomNzfX3Hnnna65zZo1M8YYc8MNNxhJpnv37qagoMD1hc2SzAMPPGCOHTtmQkJCjCTzwgsvmMTERLfnXLhwoVmxYoXPz53yfU2YMMF89913xW5jDVJlVTExMWb9+vXnnHPo0CEzevRo1+OqVasap9Np+vXrZySZ5s2bG2OMadeunWtOt27dTGFhoYmIiDCSzLBhw8yRI0dca/PMc2/fvt3nPwPK/2r69Olm165dRuL3IeUfxVv1UO7NmjVLy5Yt09q1a93G27VrpwoVKmjNmjWusZ07d2r//v2KioqSJEVFRSkxMVFpaWmuOatWrVK1atXUsmVL15zfH+PMnDPHAK666iolJydrz549evfdd3XFFVdIYg2i7Nx+++369ttvtWjRIqWmpmrLli0aOnSoa3ujRo0UERHhto5OnDihzZs3u63FjIwMJSQkuOasWbNGp06d0g033OCas379euXn57vmrFq1Ss2bN1f16tVL+SxxMQkJCdGAAQP09ttvS+L3IfwDwQnlWr9+/dS2bVuNHTu2yLa6desqNzdXx48fdxtPTU1V3bp1XXNSU1OLbD+z7VxzqlWrpooVK3rtXHBx2rx5s+655x51795dw4cPV6NGjfTVV1+pSpUqrEGUmSuvvFLDhw/Xrl271K1bN82ePVuvvPKKBg4cKOl/a6m4dfT7dfb7F6ySVFhYqKNHj57XegUk6Y477lD16tU1d+5cSfxvMvxDsK8bAHzl8ssv14wZM9SlSxfl5ub6uh2UUytXrnT9PTExUZs3b9b+/fvVt29fOZ1OH3aG8iQwMFDffvutnnrqKUnS1q1bdc0112jYsGF65513fNwdyqN7771XK1as0OHDh33dCuDCFSeUW+3atVN4eLi2bNmi/Px85efnq1OnTvq///s/5efnKzU1VaGhoapWrZrbfuHh4UpJSZEkpaSkKDw8vMj2M9vONef48ePKyckprdPDRer48eP6+eef1aRJE6WkpLAGUSYOHz6sn376yW1s+/btioyMlPS/tVTcOvr9OvvjHc6CgoJUs2bN81qvQGRkpG699Va99dZbrjF+H8IfEJxQbq1du1bXXHON2rRp46pvvvlGCxYsUJs2bfTtt98qLy9P0dHRrn2aNm2qBg0aKD4+XpIUHx+vVq1aqXbt2q45Xbp00fHjx10vQuLj492OcWbOmWMAvxcWFqbGjRvr8OHDSkhIYA2iTGzcuFHNmjVzG2vatKn2798v6bfbRB8+fNhtHTkcDt1www1ua7FGjRpq27ata84tt9yiwMBAbd682TXn5ptvVnDw/97w0qVLF+3YsUPHjh0rrdPDRWbw4MFKS0vTsmXLXGP8PoS/8PkdKijKX+r3d9WTfrv16b59+0ynTp1M27ZtzcaNG83GjRtd28/c+nTlypXm2muvNV27djWpqanF3vp06tSpplmzZmb48OHc+pRy1Ysvvmhuvvlm06BBAxMVFWU+//xzk5aWZmrVqmUk1iBVNnX99debvLw8M3bsWNO4cWNz9913m5MnT5q///3vrjlPPPGEOXr0qLntttvMNddcYxYvXlzs7cgTEhJM+/btzY033mh27tzpdjvyqlWrmsOHD5t58+aZFi1amL59+5qTJ09yO3LKVQEBAWbfvn0mJiamyDZ+H1J+UD5vgKL8pv4YnEJDQ83MmTPNkSNHzMmTJ83HH39swsPD3faJjIw0y5YtM1lZWSYtLc28+OKLJigoyG1Ox44dzZYtW0xOTo7ZvXu3GTRokM/PlfKPWrhwoUlOTjY5OTnmwIEDZuHChW7fncMapMqqevXqZb7//nvjdDrNTz/9ZIYOHVpkzqRJk8zhw4eN0+k0q1evNldddZXb9ho1apgFCxaYEydOmGPHjpk5c+aYsLAwtzmtWrUy69evN06n0xw4cMA88cQTPj93yn+qS5cuxhhTZG1J/D6kfF8Bp/8CAAAAADgLPuMEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYEFwAgAAAAALghMAAAAAWBCcAAAAAMCC4AQA8BsNGjSQMUatW7f2dSsAALghOAEAvMoYc86aMGGCr1ssVuPGjfX222/rwIEDysnJ0d69e/Xee++pXbt2ZdoH4REA/FOwrxsAAFxa6tat6/p7v379NHnyZDVr1sw1dvLkSV+0dU7t2rXT2rVr9cMPP+iBBx7Qjh075HA41Lt3b7300kvq1KmTr1sEAPgBQ1EURVGlUYMGDTIZGRmuxwEBAeaZZ54xBw4cMDk5Oea7774z3bp1c21v0KCBMcaY1q1bG0kmMDDQzJkzx2zfvt1cccUVRpK5/fbbTUJCgnE6nWbPnj1m/PjxJigoyHUMY4y59957zX/+8x+TlZVlfv75Z3Pbbbeds8/ExETzzTffmICAgCLbqlWr5vr7NddcY9auXWuys7NNenq6eeONN0xYWJhr+7p168z06dPd9l+8eLGJi4tzPU5KSjJjx441c+bMMSdOnDD79+839913n1v/v7du3Tqf/ztSFEVRMvKDBiiKoqhLtP4YnEaOHGmOHTtm+vXrZ5o2bWpeeOEFk5uba5o0aWIk9+BUoUIF8/HHH5uEhARTq1YtI8ncdNNN5tixY2bgwIGmUaNG5tZbbzV79+4148ePdz2HMcb88ssvpn///qZx48bm5ZdfNidOnDA1atQotsc2bdoYY4zp37//Oc+lcuXKJjk52Xz00UemZcuWpnPnzmbPnj1uoaikwSk9Pd0MHz7cNG7c2IwZM8YUFBSYpk2bGknm+uuvN8YYc8stt5jw8PCz9k1RFEWVefm8AYqiKOoSrT8Gp4MHD5qxY8e6zdm8ebOZOXOmkf4XnDp06GBWr15t1q9fb6pWreqau3r1avPkk0+67f+Pf/zDJCcnux4bY8zkyZNdjytXrmyMMW5Xtn5fffr0McYY06ZNm3Oey9ChQ82RI0dM5cqVXWM9evQwBQUFpk6dOkYqeXB655133OakpKSYBx54wO1ncOaqG0VRFOUfxWecAABlwuFwqH79+tq4caPb+MaNG4vcCGHhwoU6ePCgbrnlFuXk5LjGW7durQ4dOuipp55yjQUFBalSpUqqVKmSnE6nJOn77793bc/Oztbx48dVp06dYvsKCAgoUf9XX321tm3bpuzsbLfeg4KC1KxZM6WlpZXoOH/sT5JSUlLO2h8AwD9wVz0AgN9Zvny5rr32WkVFRbmNV6lSRRMmTFCbNm1c1apVKzVp0sQtYOXn57vtZ4xRYGDx/5P3888/S5KaN29+wX2fOnWqSBALCQkpMu98+gMA+Ad+SwMAykRmZqaSk5PVoUMHt/EOHTrop59+chubPXu2nnzySS1ZskQ333yza3zLli1q1qyZ9uzZU6SMMR71tXXrVv34448aPXp0sVefqlWrJknavn27WrdurcqVK7v1XlhYqJ07d0qSfv31V0VERLi2BwYG6pprrjmvfvLy8iT9diUNAOA/CE4AgDLz4osvasyYMerbt6+aNm2qmJgYtWnTRjNmzCgyd+bMmXr66af12WefucLW5MmTNXDgQI0fP14tWrRQ8+bN1a9fPz377LMX1NfgwYPVtGlTffXVV+rRo4caNWqkVq1aady4cfr0008lSQsWLFBOTo7mzZunli1bqlOnTnr11Vc1f/5819v0vvjiC/Xq1Us9e/ZUs2bNNHv2bFWvXv28eklLS1N2dra6d++uOnXqqGrVqhd0bgAA7/H5B60oiqKoS7OKux35+PHjzYEDB0xubq71duSSzKOPPmqOHz9uoqKijCTTtWtXs2HDBpOVlWWOHTtmNm3aZIYOHeqab4wxvXv3dusjIyPDDBo06Jy9XnXVVWbu3Lnm4MGDJicnxyQlJZkFCxa43TTCdjvy4OBgM2vWLJOenm5SUlLMmDFjir05xCOPPOL23N99952ZMGGC6/G9995r9u/fbwoKCrgdOUVRlJ9UwOm/AAAAAADOgrfqAQAAAIAFwQkAAAAALAhOAAAAAGBBcAIAAAAAC4ITAAAAAFgQnAAAAADAguAEAAAAABYEJwAAAACwIDgBAAAAgAXBCQAAAAAsCE4AAAAAYPH/xdBrWYMESQoAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 1000x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader\n",
    "from bs4 import BeautifulSoup as Soup\n",
    "import tiktoken\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "# 토큰 수 계산\n",
    "def num_tokens_from_string(string: str, encoding_name: str):\n",
    "    encoding = tiktoken.get_encoding(encoding_name)\n",
    "    num_tokens = len(encoding.encode(string))\n",
    "    return num_tokens\n",
    "\n",
    "\n",
    "# LCEL 문서 로드\n",
    "url = \"https://python.langchain.com/docs/concepts/lcel/\"\n",
    "loader = RecursiveUrlLoader(\n",
    "    url=url, max_depth=20, extractor=lambda x: Soup(x, \"html.parser\").text\n",
    ")\n",
    "docs = loader.load()\n",
    "\n",
    "# PydanticOutputParser를 사용한 LCEL 문서 로드 (기본 LCEL 문서 외부)\n",
    "url = \"https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html\"\n",
    "loader = RecursiveUrlLoader(\n",
    "    url=url, max_depth=1, extractor=lambda x: Soup(x, \"html.parser\").text\n",
    ")\n",
    "docs_pydantic = loader.load()\n",
    "\n",
    "# Self Query를 사용한 LCEL 문서 로드 (기본 LCEL 문서 외부)\n",
    "url = \"https://python.langchain.com/docs/how_to/self_query/\"\n",
    "loader = RecursiveUrlLoader(\n",
    "    url=url, max_depth=1, extractor=lambda x: Soup(x, \"html.parser\").text\n",
    ")\n",
    "docs_sq = loader.load()\n",
    "\n",
    "# 문서 텍스트\n",
    "docs.extend([*docs_pydantic, *docs_sq])\n",
    "docs_texts = [d.page_content for d in docs]\n",
    "\n",
    "# 각 문서에 대한 토큰 수 계산\n",
    "counts = [num_tokens_from_string(d, \"cl100k_base\") for d in docs_texts]\n",
    "\n",
    "# 토큰 수의 히스토그램을 그립니다.\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.hist(counts, bins=30, color=\"blue\", edgecolor=\"black\", alpha=0.7)\n",
    "plt.title(\"Token Count Distribution\")\n",
    "plt.xlabel(\"Token Count\")\n",
    "plt.ylabel(\"Frequency\")\n",
    "plt.grid(axis=\"y\", alpha=0.75)\n",
    "\n",
    "# 히스토그램을 표시합니다.\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66c9d8ad",
   "metadata": {},
   "source": [
    "문서 텍스트를 정렬합니다. 이때 메타데이터의 `source` 를 기준으로 정렬한 뒤, 모든 문서를 연결합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ff0a783a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "전체 토큰 수: 15591\n"
     ]
    }
   ],
   "source": [
    "# 문서를 출처 메타데이터 기준으로 정렬합니다.\n",
    "d_sorted = sorted(docs, key=lambda x: x.metadata[\"source\"])\n",
    "d_reversed = list(reversed(d_sorted))\n",
    "\n",
    "# 역순으로 배열된 문서의 내용을 연결합니다.\n",
    "concatenated_content = \"\\n\\n\\n --- \\n\\n\\n\".join(\n",
    "    [doc.page_content for doc in d_reversed]\n",
    ")\n",
    "\n",
    "print(\n",
    "    \"전체 토큰 수: %s\"  # 모든 문맥에서의 토큰 수를 출력합니다.\n",
    "    % num_tokens_from_string(concatenated_content, \"cl100k_base\")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50a33263",
   "metadata": {},
   "source": [
    "`RecursiveCharacterTextSplitter`를 사용하여 텍스트를 분할합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8982e516",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 텍스트 분할을 위한 코드\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "# 기준 토큰수\n",
    "chunk_size = 2000\n",
    "\n",
    "# 텍스트 분할기 초기화\n",
    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
    "    chunk_size=chunk_size, chunk_overlap=0\n",
    ")\n",
    "\n",
    "# 주어진 텍스트를 분할\n",
    "texts_split = text_splitter.split_text(concatenated_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcc0990c",
   "metadata": {},
   "source": [
    "다음으로는 분할된 chunk 들을 임베딩하여 vector store 에 저장합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b7d18ea6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain.embeddings import CacheBackedEmbeddings\n",
    "from langchain.storage import LocalFileStore\n",
    "\n",
    "# cache 저장 경로 지정\n",
    "store = LocalFileStore(\"./cache/\")\n",
    "\n",
    "# embeddings 인스턴스를 생성\n",
    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\", disallowed_special=())\n",
    "\n",
    "# CacheBackedEmbeddings 인스턴스를 생성\n",
    "cached_embeddings = CacheBackedEmbeddings.from_bytes_store(\n",
    "    embeddings, store, namespace=embeddings.model\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24bcae7c",
   "metadata": {},
   "source": [
    "## 모델 설정"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a48631ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_teddynote.messages import stream_response\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "\n",
    "# llm 모델 초기화\n",
    "llm = ChatOpenAI(\n",
    "    model=\"gpt-4.1-mini\",\n",
    "    temperature=0,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b89741a7",
   "metadata": {},
   "source": [
    "## 트리 구축\n",
    "\n",
    "트리 구축에서의 클러스터링 접근 방식에 대한 주요 개요입니다.\n",
    "\n",
    "**GMM (가우시안 혼합 모델)**\n",
    "\n",
    "- 다양한 클러스터에 걸쳐 데이터 포인트의 분포를 모델링합니다.\n",
    "- 모델의 베이지안 정보 기준(BIC)을 평가하여 최적의 클러스터 수를 결정합니다.\n",
    "\n",
    "**UMAP (Uniform Manifold Approximation and Projection)**\n",
    "\n",
    "- 클러스터링을 지원합니다.\n",
    "- 고차원 데이터의 차원을 축소합니다.\n",
    "- UMAP은 데이터 포인트의 유사성에 기반하여 자연스러운 그룹화를 강조하는 데 도움을 줍니다.\n",
    "\n",
    "**지역 및 전역 클러스터링**\n",
    "\n",
    "- 데이터를 저차원으로 차원 축소하여 클러스터링을 수행합니다.\n",
    "\n",
    "**임계값 설정**\n",
    "\n",
    "- GMM의 맥락에서 클러스터 멤버십을 결정하기 위해 적용됩니다.\n",
    "- 확률 분포를 기반으로 합니다(데이터 포인트를 ≥ 1 클러스터에 할당).\n",
    "\n",
    "---\n",
    "\n",
    "GMM 및 임계값 설정에 대한 코드는 아래 두 출처에서 언급된 Sarthi et al의 것입니다. \n",
    "\n",
    "**참조**\n",
    "\n",
    "- [원본 저장소](https://github.com/parthsarthi03/raptor/blob/master/raptor/cluster_tree_builder.py)\n",
    "- [소소한 조정](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/clustering.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6bd9d71",
   "metadata": {},
   "source": [
    "### 차원 축소\n",
    "\n",
    "`global_cluster_embeddings`\n",
    "\n",
    "- 입력된 임베딩 벡터를 전역적으로 차원 축소하기 위해 UMAP을 적용합니다. 전역적으로 차원을 축소한 결과물을 얻어 추후 클러스터링에 활용합니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- n_neighbors: UMAP에 사용될 이웃(neighbor) 수를 정합니다. 데이터 포인트 하나를 이해할 때 주변 데이터 포인트 개수를 나타냅니다. 입력이 없으면 데이터 개수에 따라 자동으로 계산합니다.\n",
    "- umap.UMAP(...)를 사용하여, 고차원 임베딩을 dim 차원으로 축소합니다.\n",
    "- 축소된 벡터들은 전역적(global)인 구조 파악에 도움이 되는 저차원 표현입니다.\n",
    "\n",
    "---\n",
    "\n",
    "`local_cluster_embeddings`\n",
    "\n",
    "- 선택한 데이터 부분집합에 대해 로컬(국소적) 차원 축소를 수행합니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- 글로벌 차원 축소와 유사하지만, 로컬 차원 축소는 이미 한 번 전역 클러스터링을 통해 추출한 특정 그룹(글로벌 클러스터) 내 데이터에 대해 다시 UMAP을 적용합니다.\n",
    "- 이 과정은 전역적으로 파악된 큰 구조 안에서 더 세밀한 클러스터 구조를 파악하는 데 도움이 됩니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2970a692",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Dict, List, Optional, Tuple\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import umap\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from sklearn.mixture import GaussianMixture\n",
    "\n",
    "RANDOM_SEED = 42  # 재현성을 위한 고정된 시드 값\n",
    "\n",
    "\n",
    "def global_cluster_embeddings(\n",
    "    embeddings: np.ndarray,\n",
    "    dim: int,\n",
    "    n_neighbors: Optional[int] = None,\n",
    "    metric: str = \"cosine\",\n",
    ") -> np.ndarray:\n",
    "    \"\"\"전역적으로 임베딩 벡터의 차원을 축소하는 함수입니다.\n",
    "\n",
    "    Args:\n",
    "        embeddings (np.ndarray): 차원을 축소할 임베딩 벡터들\n",
    "        dim (int): 축소할 차원의 수\n",
    "        n_neighbors (Optional[int], optional): UMAP에서 사용할 이웃의 수. 기본값은 None으로, 이 경우 데이터 크기에 따라 자동 계산됨\n",
    "        metric (str, optional): 거리 계산에 사용할 메트릭. 기본값은 \"cosine\"\n",
    "\n",
    "    Returns:\n",
    "        np.ndarray: 차원이 축소된 임베딩 벡터들\n",
    "    \"\"\"\n",
    "    # 이웃 수 계산\n",
    "    if n_neighbors is None:\n",
    "        n_neighbors = int((len(embeddings) - 1) ** 0.5)\n",
    "\n",
    "    # UMAP 적용\n",
    "    return umap.UMAP(\n",
    "        n_neighbors=n_neighbors, n_components=dim, metric=metric\n",
    "    ).fit_transform(embeddings)\n",
    "\n",
    "\n",
    "def local_cluster_embeddings(\n",
    "    embeddings: np.ndarray, dim: int, num_neighbors: int = 10, metric: str = \"cosine\"\n",
    ") -> np.ndarray:\n",
    "    \"\"\"로컬(국소적)하게 임베딩 벡터의 차원을 축소하는 함수입니다.\n",
    "\n",
    "    Args:\n",
    "        embeddings (np.ndarray): 차원을 축소할 임베딩 벡터들\n",
    "        dim (int): 축소할 차원의 수\n",
    "        num_neighbors (int, optional): UMAP에서 사용할 이웃의 수. 기본값은 10\n",
    "        metric (str, optional): 거리 계산에 사용할 메트릭. 기본값은 \"cosine\"\n",
    "\n",
    "    Returns:\n",
    "        np.ndarray: 차원이 축소된 임베딩 벡터들\n",
    "    \"\"\"\n",
    "    # UMAP 적용\n",
    "    return umap.UMAP(\n",
    "        n_neighbors=num_neighbors, n_components=dim, metric=metric\n",
    "    ).fit_transform(embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48430adc",
   "metadata": {},
   "source": [
    "### 최적의 클러스터 수 계산"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab30f112",
   "metadata": {},
   "source": [
    "`get_optimal_clusters` \n",
    "\n",
    "- 주어진 임베딩 데이터에 대해 가장 적절한 클러스터 수를 BIC 점수를 기반으로 결정합니다.\n",
    "- GMM과 BIC를 활용해 클러스터 개수를 자동으로 결정하므로, 사전에 클러스터 수를 지정할 필요가 없습니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- 가능한 클러스터 수(1 ~ max_clusters 사이)를 순회하며 각 클러스터 개수로 GMM을 학습합니다.\n",
    "- 각 GMM에 대해 BIC 점수를 계산한 뒤 리스트에 저장합니다.\n",
    "- BIC 점수가 가장 낮은(가장 좋은 성능을 보이는) 클러스터 개수를 선택하여 반환합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ade041a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_optimal_clusters(\n",
    "    embeddings: np.ndarray, max_clusters: int = 50, random_state: int = RANDOM_SEED\n",
    ") -> int:\n",
    "    \"\"\"BIC 점수를 기반으로 최적의 클러스터 수를 찾는 함수입니다.\n",
    "\n",
    "    Args:\n",
    "        embeddings (np.ndarray): 클러스터링할 임베딩 벡터들\n",
    "        max_clusters (int, optional): 탐색할 최대 클러스터 수. 기본값은 50\n",
    "        random_state (int, optional): 난수 생성을 위한 시드값. 기본값은 RANDOM_SEED\n",
    "\n",
    "    Returns:\n",
    "        int: BIC 점수가 가장 낮은(최적의) 클러스터 수\n",
    "    \"\"\"\n",
    "    # 최대 클러스터 수와 임베딩의 길이 중 작은 값을 최대 클러스터 수로 설정\n",
    "    max_clusters = min(max_clusters, len(embeddings))\n",
    "    # 1부터 최대 클러스터 수까지의 범위를 생성\n",
    "    n_clusters = np.arange(1, max_clusters)\n",
    "\n",
    "    # BIC 점수를 저장할 리스트\n",
    "    bics = []\n",
    "    for n in n_clusters:\n",
    "        gm = GaussianMixture(n_components=n, random_state=random_state)\n",
    "        gm.fit(embeddings)\n",
    "        # 학습된 모델의 BIC 점수를 리스트에 추가\n",
    "        bics.append(gm.bic(embeddings))\n",
    "\n",
    "    # BIC 점수가 가장 낮은 클러스터 수를 반환\n",
    "    return n_clusters[np.argmin(bics)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82573eae",
   "metadata": {},
   "source": [
    "### 클러스터링 수행"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c27119b8",
   "metadata": {},
   "source": [
    "`GMM_cluster` \n",
    "\n",
    "- GMM을 이용해 주어진 임베딩에 대해 클러스터를 할당합니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- `get_optimal_clusters` 를 통해 최적의 클러스터 수를 찾습니다.\n",
    "- `GaussianMixture` 모델을 해당 클러스터 수로 학습합니다.\n",
    "- 각 데이터 포인트가 각 클러스터에 속할 확률(predict_proba)을 구합니다.\n",
    "- 주어진 threshold를 바탕으로, 확률이 임계값을 초과하는 클러스터만 레이블로 할당합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f86a6d1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def GMM_cluster(embeddings: np.ndarray, threshold: float, random_state: int = 0):\n",
    "    # 최적의 클러스터 수 산정\n",
    "    n_clusters = get_optimal_clusters(embeddings)\n",
    "\n",
    "    # 가우시안 혼합 모델을 초기화\n",
    "    gm = GaussianMixture(n_components=n_clusters, random_state=random_state)\n",
    "    gm.fit(embeddings)\n",
    "\n",
    "    # 임베딩이 각 클러스터에 속할 확률을 예측\n",
    "    probs = gm.predict_proba(embeddings)\n",
    "\n",
    "    # 임계값을 초과하는 확률을 가진 클러스터를 레이블로 선택\n",
    "    labels = [np.where(prob > threshold)[0] for prob in probs]\n",
    "\n",
    "    # 레이블과 클러스터 수를 반환\n",
    "    return labels, n_clusters"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a80d9bd6",
   "metadata": {},
   "source": [
    "`perform_clustering` \n",
    "\n",
    "- 전역 차원 축소, 전역 클러스터링, 이후 로컬 차원 축소 및 로컬 클러스터링까지 전체 클러스터링 파이프라인을 수행하는 핵심 함수입니다.\n",
    "- 이전의 과정을 하나의 파이프라인으로 만들어 종합하는 역할을 수행합니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- 입력된 embeddings가 충분한지 확인(적은 경우 단순 할당).\n",
    "- 전역 차원 축소: `global_cluster_embeddings` 로 전체 임베딩에 대해 UMAP 적용.\n",
    "- 전역 클러스터링: 전역 차원 축소 결과에 대해 `GMM_cluster` 를 사용하여 전역 클러스터 형성.\n",
    "- 각 전역 클러스터에 속하는 데이터만 추출 -> 해당 집합에 대해 로컬 차원 축소(`local_cluster_embeddings`) 수행.\n",
    "- 로컬 차원 축소 결과에 대해 다시 `GMM_cluster` 로 로컬 클러스터링 수행.\n",
    "- 최종적으로, 각 데이터 포인트에 대해서 전역 및 로컬 클러스터 레이블을 함께 반환합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "32e97d75",
   "metadata": {},
   "outputs": [],
   "source": [
    "def perform_clustering(\n",
    "    embeddings: np.ndarray,\n",
    "    dim: int,\n",
    "    threshold: float,\n",
    ") -> List[np.ndarray]:\n",
    "    \"\"\"\n",
    "    임베딩에 대해 계층적 클러스터링을 수행하는 함수입니다.\n",
    "\n",
    "    전역 차원 축소와 클러스터링을 먼저 수행한 후, 각 전역 클러스터 내에서\n",
    "    로컬 차원 축소와 클러스터링을 수행합니다.\n",
    "\n",
    "    Args:\n",
    "        embeddings (np.ndarray): 클러스터링할 임베딩 벡터들\n",
    "        dim (int): 차원 축소 시 목표 차원 수\n",
    "        threshold (float): GMM 클러스터링에서 사용할 확률 임계값\n",
    "\n",
    "    Returns:\n",
    "        List[np.ndarray]: 각 데이터 포인트에 대한 로컬 클러스터 레이블 리스트.\n",
    "                         각 레이블은 해당 데이터 포인트가 속한 로컬 클러스터의 인덱스를 담은 numpy 배열입니다.\n",
    "    \"\"\"\n",
    "\n",
    "    if len(embeddings) <= dim + 1:\n",
    "        # 데이터가 충분하지 않을 때 클러스터링을 피합니다.\n",
    "        return [np.array([0]) for _ in range(len(embeddings))]\n",
    "\n",
    "    # 글로벌 차원 축소\n",
    "    reduced_embeddings_global = global_cluster_embeddings(embeddings, dim)\n",
    "\n",
    "    # 글로벌 클러스터링\n",
    "    global_clusters, n_global_clusters = GMM_cluster(\n",
    "        reduced_embeddings_global, threshold\n",
    "    )\n",
    "\n",
    "    # 로컬 클러스터링을 위한 초기화\n",
    "    all_local_clusters = [np.array([]) for _ in range(len(embeddings))]\n",
    "    total_clusters = 0\n",
    "\n",
    "    # 각 글로벌 클러스터를 순회하며 로컬 클러스터링 수행\n",
    "    for i in range(n_global_clusters):\n",
    "        # 현재 글로벌 클러스터에 속하는 임베딩 추출\n",
    "        global_cluster_embeddings_ = embeddings[\n",
    "            np.array([i in gc for gc in global_clusters])\n",
    "        ]\n",
    "\n",
    "        if len(global_cluster_embeddings_) == 0:\n",
    "            continue\n",
    "        if len(global_cluster_embeddings_) <= dim + 1:\n",
    "            # 작은 클러스터는 직접 할당으로 처리\n",
    "            local_clusters = [np.array([0]) for _ in global_cluster_embeddings_]\n",
    "            n_local_clusters = 1\n",
    "        else:\n",
    "            # 로컬 차원 축소 및 클러스터링\n",
    "            reduced_embeddings_local = local_cluster_embeddings(\n",
    "                global_cluster_embeddings_, dim\n",
    "            )\n",
    "            local_clusters, n_local_clusters = GMM_cluster(\n",
    "                reduced_embeddings_local, threshold\n",
    "            )\n",
    "\n",
    "        # 로컬 클러스터 ID 할당, 이미 처리된 총 클러스터 수를 조정\n",
    "        for j in range(n_local_clusters):\n",
    "            local_cluster_embeddings_ = global_cluster_embeddings_[\n",
    "                np.array([j in lc for lc in local_clusters])\n",
    "            ]\n",
    "            indices = np.where(\n",
    "                (embeddings == local_cluster_embeddings_[:, None]).all(-1)\n",
    "            )[1]\n",
    "            for idx in indices:\n",
    "                all_local_clusters[idx] = np.append(\n",
    "                    all_local_clusters[idx], j + total_clusters\n",
    "                )\n",
    "\n",
    "        total_clusters += n_local_clusters\n",
    "\n",
    "    return all_local_clusters"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e051d771",
   "metadata": {},
   "source": [
    "주어진 텍스트 리스트를 임베딩 모델을 이용해 벡터로 변환합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "175963c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def embed(texts):\n",
    "    \"\"\"\n",
    "    주어진 텍스트 리스트를 임베딩 벡터로 변환합니다.\n",
    "\n",
    "    Args:\n",
    "        texts (List[str]): 임베딩할 텍스트 리스트\n",
    "\n",
    "    Returns:\n",
    "        np.ndarray: 텍스트의 임베딩 벡터를 포함하는 numpy 배열\n",
    "                   shape은 (텍스트 개수, 임베딩 차원)입니다.\n",
    "    \"\"\"\n",
    "    text_embeddings = embeddings.embed_documents(texts)\n",
    "\n",
    "    # 임베딩을 numpy 배열로 변환\n",
    "    text_embeddings_np = np.array(text_embeddings)\n",
    "    return text_embeddings_np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a74db77a",
   "metadata": {},
   "source": [
    "`embed_cluster_texts` \n",
    "\n",
    "- 텍스트 리스트를 임베딩하고, 위에서 정의한 클러스터링 절차를 수행한 뒤 결과를 데이터프레임 형태로 반환합니다\n",
    "\n",
    "**과정**\n",
    "\n",
    "- embed 함수를 통해 텍스트를 임베딩합니다.\n",
    "- perform_clustering를 호출하여 클러스터 라벨을 얻습니다.\n",
    "- 원본 텍스트, 임베딩, 클러스터 정보를 하나의 DataFrame에 통합하여 반환합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "351f42b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def embed_cluster_texts(texts):\n",
    "    # 임베딩 생성\n",
    "    text_embeddings_np = embed(texts)\n",
    "    # 클러스터링 수행\n",
    "    cluster_labels = perform_clustering(text_embeddings_np, 10, 0.1)\n",
    "    # 결과를 저장할 DataFrame 초기화\n",
    "    df = pd.DataFrame()\n",
    "    # 원본 텍스트 저장\n",
    "    df[\"text\"] = texts\n",
    "    # DataFrame에 리스트로 임베딩 저장\n",
    "    df[\"embd\"] = list(text_embeddings_np)\n",
    "    # 클러스터 라벨 저장\n",
    "    df[\"cluster\"] = cluster_labels\n",
    "    return df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a4a3ecd",
   "metadata": {},
   "source": [
    "`fmt_txt` 함수는 `pandas`의 `DataFrame`에서 텍스트 문서를 단일 문자열로 포맷팅합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "903883eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fmt_txt(df: pd.DataFrame) -> str:\n",
    "    \"\"\"\n",
    "    주어진 DataFrame에서 텍스트 문서를 단일 문자열로 포맷팅하는 함수입니다.\n",
    "\n",
    "    Args:\n",
    "        df (pd.DataFrame): 포맷팅할 텍스트 문서를 포함한 DataFrame\n",
    "\n",
    "    Returns:\n",
    "        str: 텍스트 문서들을 특정 구분자로 결합한 단일 문자열\n",
    "    \"\"\"\n",
    "    unique_txt = df[\"text\"].tolist()\n",
    "    return \"--- --- \\n --- --- \".join(unique_txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecc49b18",
   "metadata": {},
   "source": [
    "`embed_cluster_summarize_texts` \n",
    "\n",
    "- 텍스트 리스트에 대해 임베딩 → 클러스터링 → 요약 까지 전체 프로세스를 수행합니다.\n",
    "\n",
    "**과정**\n",
    "\n",
    "- 임베딩 & 클러스터링: `embed_cluster_texts` 함수를 이용해 입력된 텍스트를 임베딩하고 클러스터링한 결과를 `df_clusters` 로 얻습니다. 이 `df_clusters` 는 각 문서와 그 문서를 할당받은 (하나 이상일 수 있는) 클러스터를 가지고 있습니다.\n",
    "  \n",
    "- 클러스터 할당 확장: 어떤 문서가 여러 클러스터에 속할 수 있으므로, 이를 행 단위로 '문서-클러스터' 페어로 확장한 `expanded_df` 를 만듭니다. 이렇게 하면 이후 처리(특히 요약 단계)에서 각 클러스터별로 문서를 쉽게 그룹화할 수 있습니다.\n",
    "\n",
    "- LLM(대형 언어 모델)을 이용한 요약: 각 클러스터에 속한 문서들의 텍스트를 하나의 문자열로 합친 뒤(`fmt_txt` 사용), 프롬프트 템플릿을 통해 LLM에 전달합니다. LLM은 해당 클러스터에 대한 요약 문장을 생성합니다.\n",
    "\n",
    "- 요약 결과 정리: 클러스터별 요약 결과를 `df_summary` DataFrame에 저장합니다. 여기에는 summaries(요약문), level(입력 파라미터로 받은 처리 수준), cluster(클러스터 식별자)가 포함됩니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "c242240e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def embed_cluster_summarize_texts(\n",
    "    texts: List[str], level: int\n",
    ") -> Tuple[pd.DataFrame, pd.DataFrame]:\n",
    "    \"\"\"\n",
    "    텍스트 목록에 대해 임베딩, 클러스터링 및 요약을 수행합니다. 이 함수는 먼저 텍스트에 대한 임베딩을 생성하고,\n",
    "    유사성을 기반으로 클러스터링을 수행한 다음, 클러스터 할당을 확장하여 처리를 용이하게 하고 각 클러스터 내의 내용을 요약합니다.\n",
    "\n",
    "    매개변수:\n",
    "    - texts: 처리할 텍스트 문서 목록입니다.\n",
    "    - level: 처리의 깊이나 세부 사항을 정의할 수 있는 정수 매개변수입니다.\n",
    "\n",
    "    반환값:\n",
    "    - 두 개의 데이터프레임을 포함하는 튜플:\n",
    "      1. 첫 번째 데이터프레임(`df_clusters`)은 원본 텍스트, 그들의 임베딩, 그리고 클러스터 할당을 포함합니다.\n",
    "      2. 두 번째 데이터프레임(`df_summary`)은 각 클러스터에 대한 요약, 지정된 세부 수준, 그리고 클러스터 식별자를 포함합니다.\n",
    "    \"\"\"\n",
    "\n",
    "    # 텍스트를 임베딩하고 클러스터링하여 'text', 'embd', 'cluster' 열이 있는 데이터프레임을 생성합니다.\n",
    "    df_clusters = embed_cluster_texts(texts)\n",
    "\n",
    "    # 클러스터를 쉽게 조작하기 위해 데이터프레임을 확장할 준비를 합니다.\n",
    "    expanded_list = []\n",
    "\n",
    "    # 데이터프레임 항목을 문서-클러스터 쌍으로 확장하여 처리를 간단하게 합니다.\n",
    "    for index, row in df_clusters.iterrows():\n",
    "        for cluster in row[\"cluster\"]:\n",
    "            expanded_list.append(\n",
    "                {\"text\": row[\"text\"], \"embd\": row[\"embd\"], \"cluster\": cluster}\n",
    "            )\n",
    "\n",
    "    # 확장된 목록에서 새 데이터프레임을 생성합니다.\n",
    "    expanded_df = pd.DataFrame(expanded_list)\n",
    "\n",
    "    # 처리를 위해 고유한 클러스터 식별자를 검색합니다.\n",
    "    all_clusters = expanded_df[\"cluster\"].unique()\n",
    "\n",
    "    print(f\"--Generated {len(all_clusters)} clusters--\")\n",
    "\n",
    "    # 요약\n",
    "    template = \"\"\"여기 LangChain 표현 언어 문서의 하위 집합이 있습니다.\n",
    "    \n",
    "    LangChain 표현 언어는 LangChain에서 체인을 구성하는 방법을 제공합니다.\n",
    "    \n",
    "    제공된 문서의 자세한 요약을 제공하십시오.\n",
    "    \n",
    "    문서:\n",
    "    {context}\n",
    "    \"\"\"\n",
    "    prompt = ChatPromptTemplate.from_template(template)\n",
    "    chain = prompt | llm | StrOutputParser()\n",
    "\n",
    "    # 각 클러스터 내의 텍스트를 요약을 위해 포맷팅합니다.\n",
    "    summaries = []\n",
    "    for i in all_clusters:\n",
    "        df_cluster = expanded_df[expanded_df[\"cluster\"] == i]\n",
    "        formatted_txt = fmt_txt(df_cluster)\n",
    "        summaries.append(chain.invoke({\"context\": formatted_txt}))\n",
    "\n",
    "    # 요약, 해당 클러스터 및 레벨을 저장할 데이터프레임을 생성합니다.\n",
    "    df_summary = pd.DataFrame(\n",
    "        {\n",
    "            \"summaries\": summaries,\n",
    "            \"level\": [level] * len(summaries),\n",
    "            \"cluster\": list(all_clusters),\n",
    "        }\n",
    "    )\n",
    "\n",
    "    return df_clusters, df_summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95191d59",
   "metadata": {},
   "source": [
    "`recursive_embed_cluster_summarize`\n",
    "\n",
    "- 텍스트 데이터에 대해 여러 \"단계(Level)\"에 걸쳐 클러스터링과 요약을 반복적으로 수행합니다.\n",
    "- 처음에는 원본 텍스트에 대해 클러스터링 및 요약을 수행한 뒤, 각 클러스터 요약을 다음 단계의 입력 텍스트로 삼아 다시 임베딩 → 클러스터링 → 요약을 반복합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "c417dece",
   "metadata": {},
   "outputs": [],
   "source": [
    "def recursive_embed_cluster_summarize(\n",
    "    texts: List[str], level: int = 1, n_levels: int = 3\n",
    ") -> Dict[int, Tuple[pd.DataFrame, pd.DataFrame]]:\n",
    "    # 각 레벨에서의 결과를 저장할 사전\n",
    "    results = {}\n",
    "\n",
    "    # 현재 레벨에 대해 임베딩, 클러스터링, 요약 수행\n",
    "    df_clusters, df_summary = embed_cluster_summarize_texts(texts, level)\n",
    "\n",
    "    # 현재 레벨의 결과 저장\n",
    "    results[level] = (df_clusters, df_summary)\n",
    "\n",
    "    # 추가 재귀가 가능하고 의미가 있는지 결정\n",
    "    unique_clusters = df_summary[\"cluster\"].nunique()\n",
    "\n",
    "    # 현재 레벨이 최대 레벨보다 낮고, 유니크한 클러스터가 1개 이상인 경우\n",
    "    if level < n_levels and unique_clusters > 1:\n",
    "        # 다음 레벨의 재귀 입력 텍스트로 요약 사용\n",
    "        new_texts = df_summary[\"summaries\"].tolist()\n",
    "        next_level_results = recursive_embed_cluster_summarize(\n",
    "            new_texts, level + 1, n_levels\n",
    "        )\n",
    "\n",
    "        # 다음 레벨의 결과를 현재 결과 사전에 병합\n",
    "        results.update(next_level_results)\n",
    "\n",
    "    return results"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8db60197",
   "metadata": {},
   "source": [
    "전체 문서의 개수를 확인합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2de5a73a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 전체 문서의 개수\n",
    "len(docs_texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87ab3e94",
   "metadata": {},
   "source": [
    "이제 `recursive_embed_cluster_summarize` 함수를 호출하여 트리 구축을 시작합니다.\n",
    "\n",
    "- `level=1` 은 첫 번째 단계의 클러스터링 및 요약부터 시작한다는 의미입니다.\n",
    "- `n_levels=3` 은 최대 세 단계까지(조건이 맞는 한) 클러스터링과 요약을 재귀적으로 반복할 수 있다는 뜻입니다.\n",
    "- \n",
    "결과적으로, 원본 텍스트(leaf_texts)는 먼저 level=1에서 요약되고 클러스터링됩니다. 그 결과로 나온 각 클러스터의 요약이 다음 단계의 입력(level=2)이 되고, 이를 다시 요약하여 클러스터링 한 결과가 level=3 단계의 입력이 될 수 있습니다. \n",
    "\n",
    "이 과정을 통해 점차 더 추상적이고 집약된 요약 정보를 얻을 수 있게 됩니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "e429ad5d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--Generated 1 clusters--\n"
     ]
    }
   ],
   "source": [
    "# 트리 구축\n",
    "leaf_texts = docs_texts.copy()\n",
    "\n",
    "# 재귀적으로 임베딩, 클러스터링 및 요약을 수행하여 결과를 얻음\n",
    "results = recursive_embed_cluster_summarize(leaf_texts, level=1, n_levels=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff5156ee",
   "metadata": {},
   "source": [
    "다음으로는 vectorstore를 생성하고 로컬에 저장합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "822f26cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['\\n\\n\\n\\n\\nLangChain Expression Language (LCEL) | \\uf8ffü¶úÔ∏è\\uf8ffüîó LangChain\\n\\n\\n\\n\\n\\n\\nSkip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1\\uf8ffüí¨SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval Augmented Generation (RAG) App: Part 2Build an Extraction ChainBuild an AgentTaggingBuild a Retrieval Augmented Generation (RAG) App: Part 1Build a semantic search engineBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow-to guidesHow to use tools in a chainHow to use a vectorstore as a retrieverHow to add memory to chatbotsHow to use example selectorsHow to add a semantic layer over graph databaseHow to invoke runnables in parallelHow to stream chat model responsesHow to add default invocation args to a RunnableHow to add retrieval to chatbotsHow to use few shot examples in chat modelsHow to do tool/function callingHow to install LangChain packagesHow to add examples to the prompt for query analysisHow to use few shot examplesHow to run custom functionsHow to use output parsers to parse an LLM response into structured formatHow to handle cases where no queries are generatedHow to route between sub-chainsHow to return structured data from a modelHow to summarize text through parallelizationHow to summarize text through iterative refinementHow to summarize text in a single LLM callHow to use toolkitsHow to add ad-hoc tool calling capability to LLMs and Chat ModelsBuild an Agent with AgentExecutor (Legacy)How to construct knowledge graphsHow to partially format prompt templatesHow to handle multiple queries when doing query analysisHow to use built-in tools and toolkitsHow to pass through arguments from one step to the nextHow to compose prompts togetherHow to handle multiple retrievers when doing query analysisHow to add values to a chain\\'s stateHow to construct filters for query analysisHow to configure runtime chain internalsHow deal with high cardinality categoricals when doing query analysisCustom Document LoaderHow to use the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callbacks in async environmentsHow to attach callbacks to a runnableHow to propagate callbacks  constructorHow to dispatch custom callback eventsHow to pass callbacks in at runtimeHow to split by characterHow to cache chat model responsesHow to handle rate limitsHow to init any model in one lineHow to track token usage in ChatModelsHow to add tools to chatbotsHow to split codeHow to do retrieval with contextual compressionHow to convert Runnables to ToolsHow to create custom callback handlersHow to create a custom chat model classCustom EmbeddingsHow to create a custom LLM classCustom RetrieverHow to create toolsHow to debug your LLM appsHow to load CSVsHow to load documents from a directoryHow to load HTMLHow to load JSONHow to load MarkdownHow to load Microsoft Office filesHow to load PDFsHow to load web pagesHow to create a dynamic (self-constructing) chainText embedding modelsHow to combine results from multiple retrieversHow to select examples from a LangSmith datasetHow to select examples by lengthHow to select examples by maximal marginal relevance (MMR)How to select examples by n-gram overlapHow to select examples by similarityHow to use reference examples when doing extractionHow to handle long text when doing extractionHow to use prompting alone (no tool calling) to do extractionHow to add fallbacks to a runnableHow to filter messagesHybrid SearchHow to use the LangChain indexing APIHow to inspect runnablesLangChain Expression Language CheatsheetHow to cache LLM responsesHow to track token usage for LLMsRun models locallyHow to get log probabilitiesHow to reorder retrieved results to mitigate the \"lost in the middle\" effectHow to split Markdown by HeadersHow to merge consecutive messages of the same typeHow to add message historyHow to migrate from legacy LangChain agents to LangGraphHow to retrieve using multiple vectors per documentHow to pass multimodal data directly to modelsHow to use multimodal promptsHow to create a custom Output ParserHow to use the output-fixing parserHow to parse JSON outputHow to retry when a parsing error occursHow to parse text from message objectsHow to parse XML outputHow to parse YAML outputHow to use the Parent Document RetrieverHow to use LangChain with different Pydantic versionsHow to add chat historyHow to get a RAG application to add citationsHow to do per-user retrievalHow to get your RAG application to return sourcesHow to stream results from your RAG applicationHow to split JSON dataHow to recursively split text by charactersResponse metadataHow to pass runtime secrets to runnablesHow to do \"self-querying\" retrievalHow to split text based on semantic similarityHow to chain runnablesHow to save and load LangChain objectsHow to split text by tokensHow to split HTMLHow to do question answering over CSVsHow to deal with large databases when doing SQL question-answeringHow to better prompt when doing SQL question-answeringHow to do query validation as part of SQL question-answeringHow to stream runnablesHow to stream responses from an LLMHow to use a time-weighted vector store retrieverHow to return artifacts from a toolHow to use chat models to call toolsHow to disable parallel tool callingHow to force models to call a toolHow to access the RunnableConfig from a toolHow to pass tool outputs to chat modelsHow to pass run time values to toolsHow to stream events from a toolHow to stream tool callsHow to convert tools to OpenAI FunctionsHow to handle tool errorsHow to use few-shot prompting with tool callingHow to add a human-in-the-loop for toolsHow to bind model-specific toolsHow to trim messagesHow to create and query vector storesConceptual guideAgentsArchitectureAsync programming with langchainCallbacksChat historyChat modelsDocument loadersEmbedding modelsEvaluationExample selectorsFew-shot promptingConceptual guideKey-value storesLangChain Expression Language (LCEL)MessagesMultimodalityOutput parsersPrompt TemplatesRetrieval augmented generation (RAG)RetrievalRetrieversRunnable interfaceStreamingStructured outputsTestingString-in, string-out llmsText splittersTokensTool callingToolsTracingVector storesWhy LangChain?Ecosystem\\uf8ffü¶ú\\uf8ffüõ†Ô∏è LangSmith\\uf8ffü¶ú\\uf8ffüï∏Ô∏è LangGraphVersionsv0.3v0.2Pydantic compatibilityMigrating from v0.0 chainsHow to migrate from v0.0 chainsMigrating from ConstitutionalChainMigrating from ConversationalChainMigrating from ConversationalRetrievalChainMigrating from LLMChainMigrating from LLMMathChainMigrating from LLMRouterChainMigrating from MapReduceDocumentsChainMigrating from MapRerankDocumentsChainMigrating from MultiPromptChainMigrating from RefineDocumentsChainMigrating from RetrievalQAMigrating from StuffDocumentsChainUpgrading to LangGraph memoryHow to migrate to LangGraph memoryHow to use BaseChatMessageHistory with LangGraphMigrating off ConversationBufferMemory or ConversationStringBufferMemoryMigrating off ConversationBufferWindowMemory or ConversationTokenBufferMemoryMigrating off ConversationSummaryMemory or ConversationSummaryBufferMemoryA Long-Term Memory AgentRelease policySecurity PolicyConceptual guideLangChain Expression Language (LCEL)On this pageLangChain Expression Language (LCEL)\\nPrerequisites\\nRunnable Interface\\n\\nThe LangChain Expression Language (LCEL) takes a declarative approach to building new Runnables from existing Runnables.\\nThis means that you describe what should happen, rather than how it should happen, allowing LangChain to optimize the run-time execution of the chains.\\nWe often refer to a Runnable created using LCEL as a \"chain\". It\\'s important to remember that a \"chain\" is Runnable and it implements the full Runnable Interface.\\nnote\\nThe LCEL cheatsheet shows common patterns that involve the Runnable interface and LCEL expressions.\\nPlease see the following list of how-to guides that cover common tasks with LCEL.\\nA list of built-in Runnables can be found in the LangChain Core API Reference. Many of these Runnables are useful when composing custom \"chains\" in LangChain using LCEL.\\n\\nBenefits of LCEL‚Äã\\nLangChain optimizes the run-time execution of chains built with LCEL in a number of ways:\\n\\nOptimized parallel execution: Run Runnables in parallel using RunnableParallel or run multiple inputs through a given chain in parallel using the Runnable Batch API. Parallel execution can significantly reduce the latency as processing can be done in parallel instead of sequentially.\\nGuaranteed Async support: Any chain built with LCEL can be run asynchronously using the Runnable Async API. This can be useful when running chains in a server environment where you want to handle large number of requests concurrently.\\nSimplify streaming: LCEL chains can be streamed, allowing for incremental output as the chain is executed. LangChain can optimize the streaming of the output to minimize the time-to-first-token(time elapsed until the first chunk of output from a chat model or llm comes out).\\n\\nOther benefits include:\\n\\nSeamless LangSmith tracing\\nAs your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.\\nWith LCEL, all steps are automatically logged to LangSmith for maximum observability and debuggability.\\nStandard API: Because all chains are built using the Runnable interface, they can be used in the same way as any other Runnable.\\nDeployable with LangServe: Chains built with LCEL can be deployed using for production use.\\n\\nShould I use LCEL?‚Äã\\nLCEL is an orchestration solution -- it allows LangChain to handle run-time execution of chains in an optimized way.\\nWhile we have seen users run chains with hundreds of steps in production, we generally recommend using LCEL for simpler orchestration tasks. When the application requires complex state management, branching, cycles or multiple agents, we recommend that users take advantage of LangGraph.\\nIn LangGraph, users define graphs that specify the application\\'s flow. This allows users to keep using LCEL within individual nodes when LCEL is needed, while making it easy to define complex orchestration logic that is more readable and maintainable.\\nHere are some guidelines:\\n\\nIf you are making a single LLM call, you don\\'t need LCEL; instead call the underlying chat model directly.\\nIf you have a simple chain (e.g., prompt + llm + parser, simple retrieval set up etc.), LCEL is a reasonable fit, if you\\'re taking advantage of the LCEL benefits.\\nIf you\\'re building a complex chain (e.g., with branching, cycles, multiple agents, etc.) use LangGraph instead. Remember that you can always use LCEL within individual nodes in LangGraph.\\n\\nComposition Primitives‚Äã\\nLCEL chains are built by composing existing Runnables together. The two main composition primitives are RunnableSequence and RunnableParallel.\\nMany other composition primitives (e.g., RunnableAssign) can be thought of as variations of these two primitives.\\nnoteYou can find a list of all composition primitives in the LangChain Core API Reference.\\nRunnableSequence‚Äã\\nRunnableSequence is a composition primitive that allows you \"chain\" multiple runnables sequentially, with the output of one runnable serving as the input to the next.\\nfrom langchain_core.runnables import RunnableSequencechain = RunnableSequence([runnable1, runnable2])API Reference:RunnableSequence\\nInvoking the chain with some input:\\nfinal_output = chain.invoke(some_input)\\ncorresponds to the following:\\noutput1 = runnable1.invoke(some_input)final_output = runnable2.invoke(output1)\\nnoterunnable1 and runnable2 are placeholders for any Runnable that you want to chain together.\\nRunnableParallel‚Äã\\nRunnableParallel is a composition primitive that allows you to run multiple runnables concurrently, with the same input provided to each.\\nfrom langchain_core.runnables import RunnableParallelchain = RunnableParallel({    \"key1\": runnable1,    \"key2\": runnable2,})API Reference:RunnableParallel\\nInvoking the chain with some input:\\nfinal_output = chain.invoke(some_input)\\nWill yield a final_output dictionary with the same keys as the input dictionary, but with the values replaced by the output of the corresponding runnable.\\n{    \"key1\": runnable1.invoke(some_input),    \"key2\": runnable2.invoke(some_input),}\\nRecall, that the runnables are executed in parallel, so while the result is the same as\\ndictionary comprehension shown above, the execution time is much faster.\\nnoteRunnableParallelsupports both synchronous and asynchronous execution (as all Runnables do).\\nFor synchronous execution, RunnableParallel uses a ThreadPoolExecutor to run the runnables concurrently.\\nFor asynchronous execution, RunnableParallel uses asyncio.gather to run the runnables concurrently.\\n\\nComposition Syntax‚Äã\\nThe usage of RunnableSequence and RunnableParallel is so common that we created a shorthand syntax for using them. This helps\\nto make the code more readable and concise.\\nThe | operator‚Äã\\nWe have overloaded the | operator to create a RunnableSequence from two Runnables.\\nchain = runnable1 | runnable2\\nis Equivalent to:\\nchain = RunnableSequence([runnable1, runnable2])\\nThe .pipe method`‚Äã\\nIf you have moral qualms with operator overloading, you can use the .pipe method instead. This is equivalent to the | operator.\\nchain = runnable1.pipe(runnable2)\\nCoercion‚Äã\\nLCEL applies automatic type coercion to make it easier to compose chains.\\nIf you do not understand the type coercion, you can always use the RunnableSequence and RunnableParallel classes directly.\\nThis will make the code more verbose, but it will also make it more explicit.\\nDictionary to RunnableParallel‚Äã\\nInside an LCEL expression, a dictionary is automatically converted to a RunnableParallel.\\nFor example, the following code:\\nmapping = {    \"key1\": runnable1,    \"key2\": runnable2,}chain = mapping | runnable3\\nIt gets automatically converted to the following:\\nchain = RunnableSequence([RunnableParallel(mapping), runnable3])\\ncautionYou have to be careful because the mapping dictionary is not a RunnableParallel object, it is just a dictionary. This means that the following code will raise an AttributeError:mapping.invoke(some_input)\\nFunction to RunnableLambda‚Äã\\nInside an LCEL expression, a function is automatically converted to a RunnableLambda.\\ndef some_func(x):    return xchain = some_func | runnable1\\nIt gets automatically converted to the following:\\nchain = RunnableSequence([RunnableLambda(some_func), runnable1])\\ncautionYou have to be careful because the lambda function is not a RunnableLambda object, it is just a function. This means that the following code will raise an AttributeError:lambda x: x + 1.invoke(some_input)\\nLegacy chains‚Äã\\nLCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as LLMChain and\\nConversationalRetrievalChain. Many of these legacy chains hide important details like prompts, and as a wider variety\\nof viable models emerge, customization has become more and more important.\\nIf you are currently using one of these legacy chains, please see this guide for guidance on how to migrate.\\nFor guides on how to do specific tasks with LCEL, check out the relevant how-to guides.Edit this pageWas this page helpful?PreviousKey-value storesNextMessagesBenefits of LCELShould I use LCEL?Composition PrimitivesRunnableSequenceRunnableParallelComposition SyntaxThe | operatorThe .pipe method`CoercionLegacy chainsCommunityTwitterGitHubOrganizationPythonJS/TSMoreHomepageBlogYouTubeCopyright ¬© 2024 LangChain, Inc.\\n\\n',\n",
       " '\\n\\n\\n\\n\\n\\n\\n\\n\\nPydanticOutputParser ‚Äî \\uf8ffü¶ú\\uf8ffüîó LangChain  documentation\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nSkip to main content\\n\\n\\nBack to top\\n\\n\\n\\n\\nCtrl+K\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n    Reference\\n  \\n\\n\\n\\n\\n\\n\\n\\n\\n\\nCtrl+K\\n\\n\\n\\n\\n\\n\\n\\nDocs\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGitHub\\n\\n\\n\\nX / Twitter\\n\\n\\n\\n\\n\\n\\n\\n\\nCtrl+K\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n    Reference\\n  \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nDocs\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGitHub\\n\\n\\n\\nX / Twitter\\n\\n\\n\\n\\n\\n\\n\\nSection Navigation\\nBase packages\\n\\nCore\\nagents\\nbeta\\ncaches\\ncallbacks\\nchat_history\\nchat_loaders\\nchat_sessions\\ndocument_loaders\\ndocuments\\nembeddings\\nexample_selectors\\nexceptions\\nglobals\\nindexing\\nlanguage_models\\nload\\nmessages\\noutput_parsers\\nBaseGenerationOutputParser\\nBaseLLMOutputParser\\nBaseOutputParser\\nJsonOutputParser\\nSimpleJsonOutputParser\\nCommaSeparatedListOutputParser\\nListOutputParser\\nMarkdownListOutputParser\\nNumberedListOutputParser\\nJsonKeyOutputFunctionsParser\\nJsonOutputFunctionsParser\\nOutputFunctionsParser\\nPydanticAttrOutputFunctionsParser\\nPydanticOutputFunctionsParser\\nJsonOutputKeyToolsParser\\nJsonOutputToolsParser\\nPydanticToolsParser\\nPydanticOutputParser\\nStrOutputParser\\nBaseCumulativeTransformOutputParser\\nBaseTransformOutputParser\\nXMLOutputParser\\ndroplastn\\nmake_invalid_tool_call\\nparse_tool_call\\nparse_tool_calls\\nnested_element\\n\\n\\noutputs\\nprompt_values\\nprompts\\nrate_limiters\\nretrievers\\nrunnables\\nstores\\nstructured_query\\nsys_info\\ntools\\ntracers\\nutils\\nvectorstores\\n\\n\\nLangchain\\nText Splitters\\nCommunity\\nExperimental\\n\\nIntegrations\\n\\nAI21\\nAnthropic\\nAstraDB\\nAWS\\nAzure Dynamic Sessions\\nBox\\nCerebras\\nChroma\\nCohere\\nCouchbase\\nDatabricks\\nElasticsearch\\nExa\\nFireworks\\nGoogle Community\\nGoogle GenAI\\nGoogle VertexAI\\nGroq\\nHuggingface\\nIBM\\nMilvus\\nMistralAI\\nNeo4J\\nNomic\\nNvidia Ai Endpoints\\nOllama\\nOpenAI\\nPinecone\\nPostgres\\nPrompty\\nQdrant\\nRedis\\nSema4\\nSnowflake\\nSqlserver\\nStandard Tests\\nTogether\\nUnstructured\\nUpstage\\nVoyageAI\\nWeaviate\\nXAI\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLangChain Python API Reference\\nlangchain-core: 0.3.28\\noutput_parsers\\nPydanticOutputParser\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPydanticOutputParser#\\n\\n\\nclass langchain_core.output_parsers.pydantic.PydanticOutputParser[source]#\\nBases: JsonOutputParser, Generic[TBaseModel]\\nParse an output using a pydantic model.\\n\\nNote\\nPydanticOutputParser implements the standard Runnable Interface. \\uf8ffüèÉ\\nThe Runnable Interface has additional methods that are available on runnables, such as with_types, with_retry, assign, bind, get_graph, and more.\\n\\n\\n\\nparam diff: bool = False#\\nIn streaming mode, whether to yield diffs between the previous and current\\nparsed output, or just the current parsed output.\\n\\n\\n\\nparam pydantic_object: Annotated[type[TBaseModel], SkipValidation()] [Required]#\\nThe pydantic model to parse.\\n\\n\\n\\nasync abatch(inputs: list[Input], config: RunnableConfig | list[RunnableConfig] | None = None, *, return_exceptions: bool = False, **kwargs: Any | None) ‚Üí list[Output]#\\nDefault implementation runs ainvoke in parallel using asyncio.gather.\\nThe default implementation of batch works well for IO bound runnables.\\nSubclasses should override this method if they can batch more efficiently;\\ne.g., if the underlying Runnable uses an API which supports a batch mode.\\n\\nParameters:\\n\\ninputs (list[Input]) ‚Äì A list of inputs to the Runnable.\\nconfig (RunnableConfig | list[RunnableConfig] | None) ‚Äì A config to use when invoking the Runnable.\\nThe config supports standard keys like ‚Äòtags‚Äô, ‚Äòmetadata‚Äô for tracing\\npurposes, ‚Äòmax_concurrency‚Äô for controlling how much work to do\\nin parallel, and other keys. Please refer to the RunnableConfig\\nfor more details. Defaults to None.\\nreturn_exceptions (bool) ‚Äì Whether to return exceptions instead of raising them.\\nDefaults to False.\\nkwargs (Any | None) ‚Äì Additional keyword arguments to pass to the Runnable.\\n\\n\\nReturns:\\nA list of outputs from the Runnable.\\n\\nReturn type:\\nlist[Output]\\n\\n\\n\\n\\n\\nasync abatch_as_completed(inputs: Sequence[Input], config: RunnableConfig | Sequence[RunnableConfig] | None = None, *, return_exceptions: bool = False, **kwargs: Any | None) ‚Üí AsyncIterator[tuple[int, Output | Exception]]#\\nRun ainvoke in parallel on a list of inputs,\\nyielding results as they complete.\\n\\nParameters:\\n\\ninputs (Sequence[Input]) ‚Äì A list of inputs to the Runnable.\\nconfig (RunnableConfig | Sequence[RunnableConfig] | None) ‚Äì A config to use when invoking the Runnable.\\nThe config supports standard keys like ‚Äòtags‚Äô, ‚Äòmetadata‚Äô for tracing\\npurposes, ‚Äòmax_concurrency‚Äô for controlling how much work to do\\nin parallel, and other keys. Please refer to the RunnableConfig\\nfor more details. Defaults to None. Defaults to None.\\nreturn_exceptions (bool) ‚Äì Whether to return exceptions instead of raising them.\\nDefaults to False.\\nkwargs (Any | None) ‚Äì Additional keyword arguments to pass to the Runnable.\\n\\n\\nYields:\\nA tuple of the index of the input and the output from the Runnable.\\n\\nReturn type:\\nAsyncIterator[tuple[int, Output | Exception]]\\n\\n\\n\\n\\n\\nasync ainvoke(input: str | BaseMessage, config: RunnableConfig | None = None, **kwargs: Any | None) ‚Üí T#\\nDefault implementation of ainvoke, calls invoke from a thread.\\nThe default implementation allows usage of async code even if\\nthe Runnable did not implement a native async version of invoke.\\nSubclasses should override this method if they can run asynchronously.\\n\\nParameters:\\n\\ninput (str | BaseMessage)\\nconfig (RunnableConfig | None)\\nkwargs (Any | None)\\n\\n\\nReturn type:\\nT\\n\\n\\n\\n\\n\\nasync aparse(text: str) ‚Üí T#\\nAsync parse a single string model output into some structure.\\n\\nParameters:\\ntext (str) ‚Äì String output of a language model.\\n\\nReturns:\\nStructured output.\\n\\nReturn type:\\nT\\n\\n\\n\\n\\n\\nasync aparse_result(result: list[Generation], *, partial: bool = False) ‚Üí T#\\nAsync parse a list of candidate model Generations into a specific format.\\n\\nThe return value is parsed from only the first Generation in the result, whichis assumed to be the highest-likelihood Generation.\\n\\n\\n\\nParameters:\\n\\nresult (list[Generation]) ‚Äì A list of Generations to be parsed. The Generations are assumed\\nto be different candidate outputs for a single model input.\\npartial (bool) ‚Äì Whether to parse the output as a partial result. This is useful\\nfor parsers that can parse partial results. Default is False.\\n\\n\\nReturns:\\nStructured output.\\n\\nReturn type:\\nT\\n\\n\\n\\n\\n\\nasync astream(input: Input, config: RunnableConfig | None = None, **kwargs: Any | None) ‚Üí AsyncIterator[Output]#\\nDefault implementation of astream, which calls ainvoke.\\nSubclasses should override this method if they support streaming output.\\n\\nParameters:\\n\\ninput (Input) ‚Äì The input to the Runnable.\\nconfig (RunnableConfig | None) ‚Äì The config to use for the Runnable. Defaults to None.\\nkwargs (Any | None) ‚Äì Additional keyword arguments to pass to the Runnable.\\n\\n\\nYields:\\nThe output of the Runnable.\\n\\nReturn type:\\nAsyncIterator[Output]\\n\\n\\n\\n\\n\\nasync astream_events(input: Any, config: RunnableConfig | None = None, *, version: Literal[\\'v1\\', \\'v2\\'], include_names: Sequence[str] | None = None, include_types: Sequence[str] | None = None, include_tags: Sequence[str] | None = None, exclude_names: Sequence[str] | None = None, exclude_types: Sequence[str] | None = None, exclude_tags: Sequence[str] | None = None, **kwargs: Any) ‚Üí AsyncIterator[StandardStreamEvent | CustomStreamEvent]#\\nGenerate a stream of events.\\nUse to create an iterator over StreamEvents that provide real-time information\\nabout the progress of the Runnable, including StreamEvents from intermediate\\nresults.\\nA StreamEvent is a dictionary with the following schema:\\n\\n\\nevent: str - Event names are of theformat: on_[runnable_type]_(start|stream|end).\\n\\n\\n\\nname: str - The name of the Runnable that generated the event.\\n\\nrun_id: str - randomly generated ID associated with the given execution ofthe Runnable that emitted the event.\\nA child Runnable that gets invoked as part of the execution of a\\nparent Runnable is assigned its own unique ID.\\n\\n\\n\\n\\nparent_ids: List[str] - The IDs of the parent runnables thatgenerated the event. The root Runnable will have an empty list.\\nThe order of the parent IDs is from the root to the immediate parent.\\nOnly available for v2 version of the API. The v1 version of the API\\nwill return an empty list.\\n\\n\\n\\n\\ntags: Optional[List[str]] - The tags of the Runnable that generatedthe event.\\n\\n\\n\\n\\nmetadata: Optional[Dict[str, Any]] - The metadata of the Runnablethat generated the event.\\n\\n\\n\\ndata: Dict[str, Any]\\n\\nBelow is a table that illustrates some events that might be emitted by various\\nchains. Metadata fields have been omitted from the table for brevity.\\nChain definitions have been included after the table.\\nATTENTION This reference table is for the V2 version of the schema.\\n\\n\\nevent\\nname\\nchunk\\ninput\\noutput\\n\\n\\n\\non_chat_model_start\\n[model name]\\n\\n{‚Äúmessages‚Äù: [[SystemMessage, HumanMessage]]}\\n\\n\\non_chat_model_stream\\n[model name]\\nAIMessageChunk(content=‚Äùhello‚Äù)\\n\\n\\n\\non_chat_model_end\\n[model name]\\n\\n{‚Äúmessages‚Äù: [[SystemMessage, HumanMessage]]}\\nAIMessageChunk(content=‚Äùhello world‚Äù)\\n\\non_llm_start\\n[model name]\\n\\n{‚Äòinput‚Äô: ‚Äòhello‚Äô}\\n\\n\\non_llm_stream\\n[model name]\\n‚ÄòHello‚Äô\\n\\n\\n\\non_llm_end\\n[model name]\\n\\n‚ÄòHello human!‚Äô\\n\\n\\non_chain_start\\nformat_docs\\n\\n\\n\\n\\non_chain_stream\\nformat_docs\\n‚Äúhello world!, goodbye world!‚Äù\\n\\n\\n\\non_chain_end\\nformat_docs\\n\\n[Document(‚Ä¶)]\\n‚Äúhello world!, goodbye world!‚Äù\\n\\non_tool_start\\nsome_tool\\n\\n{‚Äúx‚Äù: 1, ‚Äúy‚Äù: ‚Äú2‚Äù}\\n\\n\\non_tool_end\\nsome_tool\\n\\n\\n{‚Äúx‚Äù: 1, ‚Äúy‚Äù: ‚Äú2‚Äù}\\n\\non_retriever_start\\n[retriever name]\\n\\n{‚Äúquery‚Äù: ‚Äúhello‚Äù}\\n\\n\\non_retriever_end\\n[retriever name]\\n\\n{‚Äúquery‚Äù: ‚Äúhello‚Äù}\\n[Document(‚Ä¶), ..]\\n\\non_prompt_start\\n[template_name]\\n\\n{‚Äúquestion‚Äù: ‚Äúhello‚Äù}\\n\\n\\non_prompt_end\\n[template_name]\\n\\n{‚Äúquestion‚Äù: ‚Äúhello‚Äù}\\nChatPromptValue(messages: [SystemMessage, ‚Ä¶])\\n\\n\\n\\n\\nIn addition to the standard events, users can also dispatch custom events (see example below).\\nCustom events will be only be surfaced with in the v2 version of the API!\\nA custom event has following format:\\n\\n\\nAttribute\\nType\\nDescription\\n\\n\\n\\nname\\nstr\\nA user defined name for the event.\\n\\ndata\\nAny\\nThe data associated with the event. This can be anything, though we suggest making it JSON serializable.\\n\\n\\n\\n\\nHere are declarations associated with the standard events shown above:\\nformat_docs:\\ndef format_docs(docs: List[Document]) -> str:\\n    \\'\\'\\'Format the docs.\\'\\'\\'\\n    return \", \".join([doc.page_content for doc in docs])\\n\\nformat_docs = RunnableLambda(format_docs)\\n\\n\\nsome_tool:\\n@tool\\ndef some_tool(x: int, y: str) -> dict:\\n    \\'\\'\\'Some_tool.\\'\\'\\'\\n    return {\"x\": x, \"y\": y}\\n\\n\\nprompt:\\ntemplate = ChatPromptTemplate.from_messages(\\n    [(\"system\", \"You are Cat Agent 007\"), (\"human\", \"{question}\")]\\n).with_config({\"run_name\": \"my_template\", \"tags\": [\"my_template\"]})\\n\\n\\nExample:\\nfrom langchain_core.runnables import RunnableLambda\\n\\nasync def reverse(s: str) -> str:\\n    return s[::-1]\\n\\nchain = RunnableLambda(func=reverse)\\n\\nevents = [\\n    event async for event in chain.astream_events(\"hello\", version=\"v2\")\\n]\\n\\n# will produce the following events (run_id, and parent_ids\\n# has been omitted for brevity):\\n[\\n    {\\n        \"data\": {\"input\": \"hello\"},\\n        \"event\": \"on_chain_start\",\\n        \"metadata\": {},\\n        \"name\": \"reverse\",\\n        \"tags\": [],\\n    },\\n    {\\n        \"data\": {\"chunk\": \"olleh\"},\\n        \"event\": \"on_chain_stream\",\\n        \"metadata\": {},\\n        \"name\": \"reverse\",\\n        \"tags\": [],\\n    },\\n    {\\n        \"data\": {\"output\": \"olleh\"},\\n        \"event\": \"on_chain_end\",\\n        \"metadata\": {},\\n        \"name\": \"reverse\",\\n        \"tags\": [],\\n    },\\n]\\n\\n\\nExample: Dispatch Custom Event\\nfrom langchain_core.callbacks.manager import (\\n    adispatch_custom_event,\\n)\\nfrom langchain_core.runnables import RunnableLambda, RunnableConfig\\nimport asyncio\\n\\n\\nasync def slow_thing(some_input: str, config: RunnableConfig) -> str:\\n    \"\"\"Do something that takes a long time.\"\"\"\\n    await asyncio.sleep(1) # Placeholder for some slow operation\\n    await adispatch_custom_event(\\n        \"progress_event\",\\n        {\"message\": \"Finished step 1 of 3\"},\\n        config=config # Must be included for python < 3.10\\n    )\\n    await asyncio.sleep(1) # Placeholder for some slow operation\\n    await adispatch_custom_event(\\n        \"progress_event\",\\n        {\"message\": \"Finished step 2 of 3\"},\\n        config=config # Must be included for python < 3.10\\n    )\\n    await asyncio.sleep(1) # Placeholder for some slow operation\\n    return \"Done\"\\n\\nslow_thing = RunnableLambda(slow_thing)\\n\\nasync for event in slow_thing.astream_events(\"some_input\", version=\"v2\"):\\n    print(event)\\n\\n\\n\\nParameters:\\n\\ninput (Any) ‚Äì The input to the Runnable.\\nconfig (RunnableConfig | None) ‚Äì The config to use for the Runnable.\\nversion (Literal[\\'v1\\', \\'v2\\']) ‚Äì The version of the schema to use either v2 or v1.\\nUsers should use v2.\\nv1 is for backwards compatibility and will be deprecated\\nin 0.4.0.\\nNo default will be assigned until the API is stabilized.\\ncustom events will only be surfaced in v2.\\ninclude_names (Sequence[str] | None) ‚Äì Only include events from runnables with matching names.\\ninclude_types (Sequence[str] | None) ‚Äì Only include events from runnables with matching types.\\ninclude_tags (Sequence[str] | None) ‚Äì Only include events from runnables with matching tags.\\nexclude_names (Sequence[str] | None) ‚Äì Exclude events from runnables with matching names.\\nexclude_types (Sequence[str] | None) ‚Äì Exclude events from runnables with matching types.\\nexclude_tags (Sequence[str] | None) ‚Äì Exclude events from runnables with matching tags.\\nkwargs (Any) ‚Äì Additional keyword arguments to pass to the Runnable.\\nThese will be passed to astream_log as this implementation\\nof astream_events is built on top of astream_log.\\n\\n\\nYields:\\nAn async stream of StreamEvents.\\n\\nRaises:\\nNotImplementedError ‚Äì If the version is not v1 or v2.\\n\\nReturn type:\\nAsyncIterator[StandardStreamEvent | CustomStreamEvent]\\n\\n\\n\\n\\n\\nbatch(inputs: list[Input], config: RunnableConfig | list[RunnableConfig] | None = None, *, return_exceptions: bool = False, **kwargs: Any | None) ‚Üí list[Output]#\\nDefault implementation runs invoke in parallel using a thread pool executor.\\nThe default implementation of batch works well for IO bound runnables.\\nSubclasses should override this method if they can batch more efficiently;\\ne.g., if the underlying Runnable uses an API which supports a batch mode.\\n\\nParameters:\\n\\ninputs (list[Input])\\nconfig (RunnableConfig | list[RunnableConfig] | None)\\nreturn_exceptions (bool)\\nkwargs (Any | None)\\n\\n\\nReturn type:\\nlist[Output]\\n\\n\\n\\n\\n\\nbatch_as_completed(inputs: Sequence[Input], config: RunnableConfig | Sequence[RunnableConfig] | None = None, *, return_exceptions: bool = False, **kwargs: Any | None) ‚Üí Iterator[tuple[int, Output | Exception]]#\\nRun invoke in parallel on a list of inputs,\\nyielding results as they complete.\\n\\nParameters:\\n\\ninputs (Sequence[Input])\\nconfig (RunnableConfig | Sequence[RunnableConfig] | None)\\nreturn_exceptions (bool)\\nkwargs (Any | None)\\n\\n\\nReturn type:\\nIterator[tuple[int, Output | Exception]]\\n\\n\\n\\n\\n\\nbind(**kwargs: Any) ‚Üí Runnable[Input, Output]#\\nBind arguments to a Runnable, returning a new Runnable.\\nUseful when a Runnable in a chain requires an argument that is not\\nin the output of the previous Runnable or included in the user input.\\n\\nParameters:\\nkwargs (Any) ‚Äì The arguments to bind to the Runnable.\\n\\nReturns:\\nA new Runnable with the arguments bound.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\nExample:\\nfrom langchain_community.chat_models import ChatOllama\\nfrom langchain_core.output_parsers import StrOutputParser\\n\\nllm = ChatOllama(model=\\'llama2\\')\\n\\n# Without bind.\\nchain = (\\n    llm\\n    | StrOutputParser()\\n)\\n\\nchain.invoke(\"Repeat quoted words exactly: \\'One two three four five.\\'\")\\n# Output is \\'One two three four five.\\'\\n\\n# With bind.\\nchain = (\\n    llm.bind(stop=[\"three\"])\\n    | StrOutputParser()\\n)\\n\\nchain.invoke(\"Repeat quoted words exactly: \\'One two three four five.\\'\")\\n# Output is \\'One two\\'\\n\\n\\n\\n\\n\\nconfigurable_alternatives(which: ConfigurableField, *, default_key: str = \\'default\\', prefix_keys: bool = False, **kwargs: Runnable[Input, Output] | Callable[[], Runnable[Input, Output]]) ‚Üí RunnableSerializable#\\nConfigure alternatives for Runnables that can be set at runtime.\\n\\nParameters:\\n\\nwhich (ConfigurableField) ‚Äì The ConfigurableField instance that will be used to select the\\nalternative.\\ndefault_key (str) ‚Äì The default key to use if no alternative is selected.\\nDefaults to ‚Äúdefault‚Äù.\\nprefix_keys (bool) ‚Äì Whether to prefix the keys with the ConfigurableField id.\\nDefaults to False.\\n**kwargs (Runnable[Input, Output] | Callable[[], Runnable[Input, Output]]) ‚Äì A dictionary of keys to Runnable instances or callables that\\nreturn Runnable instances.\\n\\n\\nReturns:\\nA new Runnable with the alternatives configured.\\n\\nReturn type:\\nRunnableSerializable\\n\\n\\nfrom langchain_anthropic import ChatAnthropic\\nfrom langchain_core.runnables.utils import ConfigurableField\\nfrom langchain_openai import ChatOpenAI\\n\\nmodel = ChatAnthropic(\\n    model_name=\"claude-3-sonnet-20240229\"\\n).configurable_alternatives(\\n    ConfigurableField(id=\"llm\"),\\n    default_key=\"anthropic\",\\n    openai=ChatOpenAI()\\n)\\n\\n# uses the default model ChatAnthropic\\nprint(model.invoke(\"which organization created you?\").content)\\n\\n# uses ChatOpenAI\\nprint(\\n    model.with_config(\\n        configurable={\"llm\": \"openai\"}\\n    ).invoke(\"which organization created you?\").content\\n)\\n\\n\\n\\n\\n\\nconfigurable_fields(**kwargs: ConfigurableField | ConfigurableFieldSingleOption | ConfigurableFieldMultiOption) ‚Üí RunnableSerializable#\\nConfigure particular Runnable fields at runtime.\\n\\nParameters:\\n**kwargs (ConfigurableField | ConfigurableFieldSingleOption | ConfigurableFieldMultiOption) ‚Äì A dictionary of ConfigurableField instances to configure.\\n\\nReturns:\\nA new Runnable with the fields configured.\\n\\nReturn type:\\nRunnableSerializable\\n\\n\\nfrom langchain_core.runnables import ConfigurableField\\nfrom langchain_openai import ChatOpenAI\\n\\nmodel = ChatOpenAI(max_tokens=20).configurable_fields(\\n    max_tokens=ConfigurableField(\\n        id=\"output_token_number\",\\n        name=\"Max tokens in the output\",\\n        description=\"The maximum number of tokens in the output\",\\n    )\\n)\\n\\n# max_tokens = 20\\nprint(\\n    \"max_tokens_20: \",\\n    model.invoke(\"tell me something about chess\").content\\n)\\n\\n# max_tokens = 200\\nprint(\"max_tokens_200: \", model.with_config(\\n    configurable={\"output_token_number\": 200}\\n    ).invoke(\"tell me something about chess\").content\\n)\\n\\n\\n\\n\\n\\nget_format_instructions() ‚Üí str[source]#\\nReturn the format instructions for the JSON output.\\n\\nReturns:\\nThe format instructions for the JSON output.\\n\\nReturn type:\\nstr\\n\\n\\n\\n\\n\\ninvoke(input: str | BaseMessage, config: RunnableConfig | None = None, **kwargs: Any) ‚Üí T#\\nTransform a single input into an output. Override to implement.\\n\\nParameters:\\n\\ninput (str | BaseMessage) ‚Äì The input to the Runnable.\\nconfig (RunnableConfig | None) ‚Äì A config to use when invoking the Runnable.\\nThe config supports standard keys like ‚Äòtags‚Äô, ‚Äòmetadata‚Äô for tracing\\npurposes, ‚Äòmax_concurrency‚Äô for controlling how much work to do\\nin parallel, and other keys. Please refer to the RunnableConfig\\nfor more details.\\nkwargs (Any)\\n\\n\\nReturns:\\nThe output of the Runnable.\\n\\nReturn type:\\nT\\n\\n\\n\\n\\n\\nparse(text: str) ‚Üí TBaseModel[source]#\\nParse the output of an LLM call to a pydantic object.\\n\\nParameters:\\ntext (str) ‚Äì The output of the LLM call.\\n\\nReturns:\\nThe parsed pydantic object.\\n\\nReturn type:\\nTBaseModel\\n\\n\\n\\n\\n\\nparse_result(result: list[Generation], *, partial: bool = False) ‚Üí TBaseModel | None[source]#\\nParse the result of an LLM call to a pydantic object.\\n\\nParameters:\\n\\nresult (list[Generation]) ‚Äì The result of the LLM call.\\npartial (bool) ‚Äì Whether to parse partial JSON objects.\\nIf True, the output will be a JSON object containing\\nall the keys that have been returned so far.\\nDefaults to False.\\n\\n\\nReturns:\\nThe parsed pydantic object.\\n\\nReturn type:\\nTBaseModel | None\\n\\n\\n\\n\\n\\nparse_with_prompt(completion: str, prompt: PromptValue) ‚Üí Any#\\nParse the output of an LLM call with the input prompt for context.\\nThe prompt is largely provided in the event the OutputParser wants\\nto retry or fix the output in some way, and needs information from\\nthe prompt to do so.\\n\\nParameters:\\n\\ncompletion (str) ‚Äì String output of a language model.\\nprompt (PromptValue) ‚Äì Input PromptValue.\\n\\n\\nReturns:\\nStructured output.\\n\\nReturn type:\\nAny\\n\\n\\n\\n\\n\\nstream(input: Input, config: RunnableConfig | None = None, **kwargs: Any | None) ‚Üí Iterator[Output]#\\nDefault implementation of stream, which calls invoke.\\nSubclasses should override this method if they support streaming output.\\n\\nParameters:\\n\\ninput (Input) ‚Äì The input to the Runnable.\\nconfig (RunnableConfig | None) ‚Äì The config to use for the Runnable. Defaults to None.\\nkwargs (Any | None) ‚Äì Additional keyword arguments to pass to the Runnable.\\n\\n\\nYields:\\nThe output of the Runnable.\\n\\nReturn type:\\nIterator[Output]\\n\\n\\n\\n\\n\\nwith_alisteners(*, on_start: AsyncListener | None = None, on_end: AsyncListener | None = None, on_error: AsyncListener | None = None) ‚Üí Runnable[Input, Output]#\\nBind asynchronous lifecycle listeners to a Runnable, returning a new Runnable.\\non_start: Asynchronously called before the Runnable starts running.\\non_end: Asynchronously called after the Runnable finishes running.\\non_error: Asynchronously called if the Runnable throws an error.\\nThe Run object contains information about the run, including its id,\\ntype, input, output, error, start_time, end_time, and any tags or metadata\\nadded to the run.\\n\\nParameters:\\n\\non_start (Optional[AsyncListener]) ‚Äì Asynchronously called before the Runnable starts running.\\nDefaults to None.\\non_end (Optional[AsyncListener]) ‚Äì Asynchronously called after the Runnable finishes running.\\nDefaults to None.\\non_error (Optional[AsyncListener]) ‚Äì Asynchronously called if the Runnable throws an error.\\nDefaults to None.\\n\\n\\nReturns:\\nA new Runnable with the listeners bound.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\nExample:\\nfrom langchain_core.runnables import RunnableLambda\\nimport time\\n\\nasync def test_runnable(time_to_sleep : int):\\n    print(f\"Runnable[{time_to_sleep}s]: starts at {format_t(time.time())}\")\\n    await asyncio.sleep(time_to_sleep)\\n    print(f\"Runnable[{time_to_sleep}s]: ends at {format_t(time.time())}\")\\n\\nasync def fn_start(run_obj : Runnable):\\n    print(f\"on start callback starts at {format_t(time.time())}\\n    await asyncio.sleep(3)\\n    print(f\"on start callback ends at {format_t(time.time())}\")\\n\\nasync def fn_end(run_obj : Runnable):\\n    print(f\"on end callback starts at {format_t(time.time())}\\n    await asyncio.sleep(2)\\n    print(f\"on end callback ends at {format_t(time.time())}\")\\n\\nrunnable = RunnableLambda(test_runnable).with_alisteners(\\n    on_start=fn_start,\\n    on_end=fn_end\\n)\\nasync def concurrent_runs():\\n    await asyncio.gather(runnable.ainvoke(2), runnable.ainvoke(3))\\n\\nasyncio.run(concurrent_runs())\\nResult:\\non start callback starts at 2024-05-16T14:20:29.637053+00:00\\non start callback starts at 2024-05-16T14:20:29.637150+00:00\\non start callback ends at 2024-05-16T14:20:32.638305+00:00\\non start callback ends at 2024-05-16T14:20:32.638383+00:00\\nRunnable[3s]: starts at 2024-05-16T14:20:32.638849+00:00\\nRunnable[5s]: starts at 2024-05-16T14:20:32.638999+00:00\\nRunnable[3s]: ends at 2024-05-16T14:20:35.640016+00:00\\non end callback starts at 2024-05-16T14:20:35.640534+00:00\\nRunnable[5s]: ends at 2024-05-16T14:20:37.640169+00:00\\non end callback starts at 2024-05-16T14:20:37.640574+00:00\\non end callback ends at 2024-05-16T14:20:37.640654+00:00\\non end callback ends at 2024-05-16T14:20:39.641751+00:00\\n\\n\\n\\n\\n\\nwith_config(config: RunnableConfig | None = None, **kwargs: Any) ‚Üí Runnable[Input, Output]#\\nBind config to a Runnable, returning a new Runnable.\\n\\nParameters:\\n\\nconfig (RunnableConfig | None) ‚Äì The config to bind to the Runnable.\\nkwargs (Any) ‚Äì Additional keyword arguments to pass to the Runnable.\\n\\n\\nReturns:\\nA new Runnable with the config bound.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\n\\n\\n\\nwith_fallbacks(fallbacks: Sequence[Runnable[Input, Output]], *, exceptions_to_handle: tuple[type[BaseException], ...] = (<class \\'Exception\\'>,), exception_key: Optional[str] = None) ‚Üí RunnableWithFallbacksT[Input, Output]#\\nAdd fallbacks to a Runnable, returning a new Runnable.\\nThe new Runnable will try the original Runnable, and then each fallback\\nin order, upon failures.\\n\\nParameters:\\n\\nfallbacks (Sequence[Runnable[Input, Output]]) ‚Äì A sequence of runnables to try if the original Runnable fails.\\nexceptions_to_handle (tuple[type[BaseException], ...]) ‚Äì A tuple of exception types to handle.\\nDefaults to (Exception,).\\nexception_key (Optional[str]) ‚Äì If string is specified then handled exceptions will be passed\\nto fallbacks as part of the input under the specified key. If None,\\nexceptions will not be passed to fallbacks. If used, the base Runnable\\nand its fallbacks must accept a dictionary as input. Defaults to None.\\n\\n\\nReturns:\\nA new Runnable that will try the original Runnable, and then each\\nfallback in order, upon failures.\\n\\nReturn type:\\nRunnableWithFallbacksT[Input, Output]\\n\\n\\nExample\\nfrom typing import Iterator\\n\\nfrom langchain_core.runnables import RunnableGenerator\\n\\n\\ndef _generate_immediate_error(input: Iterator) -> Iterator[str]:\\n    raise ValueError()\\n    yield \"\"\\n\\n\\ndef _generate(input: Iterator) -> Iterator[str]:\\n    yield from \"foo bar\"\\n\\n\\nrunnable = RunnableGenerator(_generate_immediate_error).with_fallbacks(\\n    [RunnableGenerator(_generate)]\\n    )\\nprint(\\'\\'.join(runnable.stream({}))) #foo bar\\n\\n\\n\\nParameters:\\n\\nfallbacks (Sequence[Runnable[Input, Output]]) ‚Äì A sequence of runnables to try if the original Runnable fails.\\nexceptions_to_handle (tuple[type[BaseException], ...]) ‚Äì A tuple of exception types to handle.\\nexception_key (Optional[str]) ‚Äì If string is specified then handled exceptions will be passed\\nto fallbacks as part of the input under the specified key. If None,\\nexceptions will not be passed to fallbacks. If used, the base Runnable\\nand its fallbacks must accept a dictionary as input.\\n\\n\\nReturns:\\nA new Runnable that will try the original Runnable, and then each\\nfallback in order, upon failures.\\n\\nReturn type:\\nRunnableWithFallbacksT[Input, Output]\\n\\n\\n\\n\\n\\nwith_listeners(*, on_start: Callable[[Run], None] | Callable[[Run, RunnableConfig], None] | None = None, on_end: Callable[[Run], None] | Callable[[Run, RunnableConfig], None] | None = None, on_error: Callable[[Run], None] | Callable[[Run, RunnableConfig], None] | None = None) ‚Üí Runnable[Input, Output]#\\nBind lifecycle listeners to a Runnable, returning a new Runnable.\\non_start: Called before the Runnable starts running, with the Run object.\\non_end: Called after the Runnable finishes running, with the Run object.\\non_error: Called if the Runnable throws an error, with the Run object.\\nThe Run object contains information about the run, including its id,\\ntype, input, output, error, start_time, end_time, and any tags or metadata\\nadded to the run.\\n\\nParameters:\\n\\non_start (Optional[Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]]) ‚Äì Called before the Runnable starts running. Defaults to None.\\non_end (Optional[Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]]) ‚Äì Called after the Runnable finishes running. Defaults to None.\\non_error (Optional[Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]]) ‚Äì Called if the Runnable throws an error. Defaults to None.\\n\\n\\nReturns:\\nA new Runnable with the listeners bound.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\nExample:\\nfrom langchain_core.runnables import RunnableLambda\\nfrom langchain_core.tracers.schemas import Run\\n\\nimport time\\n\\ndef test_runnable(time_to_sleep : int):\\n    time.sleep(time_to_sleep)\\n\\ndef fn_start(run_obj: Run):\\n    print(\"start_time:\", run_obj.start_time)\\n\\ndef fn_end(run_obj: Run):\\n    print(\"end_time:\", run_obj.end_time)\\n\\nchain = RunnableLambda(test_runnable).with_listeners(\\n    on_start=fn_start,\\n    on_end=fn_end\\n)\\nchain.invoke(2)\\n\\n\\n\\n\\n\\nwith_retry(*, retry_if_exception_type: tuple[type[BaseException], ...] = (<class \\'Exception\\'>,), wait_exponential_jitter: bool = True, stop_after_attempt: int = 3) ‚Üí Runnable[Input, Output]#\\nCreate a new Runnable that retries the original Runnable on exceptions.\\n\\nParameters:\\n\\nretry_if_exception_type (tuple[type[BaseException], ...]) ‚Äì A tuple of exception types to retry on.\\nDefaults to (Exception,).\\nwait_exponential_jitter (bool) ‚Äì Whether to add jitter to the wait\\ntime between retries. Defaults to True.\\nstop_after_attempt (int) ‚Äì The maximum number of attempts to make before\\ngiving up. Defaults to 3.\\n\\n\\nReturns:\\nA new Runnable that retries the original Runnable on exceptions.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\nExample:\\nfrom langchain_core.runnables import RunnableLambda\\n\\ncount = 0\\n\\n\\ndef _lambda(x: int) -> None:\\n    global count\\n    count = count + 1\\n    if x == 1:\\n        raise ValueError(\"x is 1\")\\n    else:\\n         pass\\n\\n\\nrunnable = RunnableLambda(_lambda)\\ntry:\\n    runnable.with_retry(\\n        stop_after_attempt=2,\\n        retry_if_exception_type=(ValueError,),\\n    ).invoke(1)\\nexcept ValueError:\\n    pass\\n\\nassert (count == 2)\\n\\n\\n\\nParameters:\\n\\nretry_if_exception_type (tuple[type[BaseException], ...]) ‚Äì A tuple of exception types to retry on\\nwait_exponential_jitter (bool) ‚Äì Whether to add jitter to the wait time\\nbetween retries\\nstop_after_attempt (int) ‚Äì The maximum number of attempts to make before giving up\\n\\n\\nReturns:\\nA new Runnable that retries the original Runnable on exceptions.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\n\\n\\n\\nwith_types(*, input_type: type[Input] | None = None, output_type: type[Output] | None = None) ‚Üí Runnable[Input, Output]#\\nBind input and output types to a Runnable, returning a new Runnable.\\n\\nParameters:\\n\\ninput_type (type[Input] | None) ‚Äì The input type to bind to the Runnable. Defaults to None.\\noutput_type (type[Output] | None) ‚Äì The output type to bind to the Runnable. Defaults to None.\\n\\n\\nReturns:\\nA new Runnable with the types bound.\\n\\nReturn type:\\nRunnable[Input, Output]\\n\\n\\n\\n\\nExamples using PydanticOutputParser\\n\\nGenerate Synthetic Data\\nHow to retry when a parsing error occurs\\nHow to return structured data from a model\\nHow to use output parsers to parse an LLM response into structured format\\nHow to use prompting alone (no tool calling) to do extraction\\nHow to use the output-fixing parser\\n\\n\\n\\n\\n\\n\\n\\n\\n On this page\\n  \\n\\n\\nPydanticOutputParser\\ndiff\\npydantic_object\\nabatch()\\nabatch_as_completed()\\nainvoke()\\naparse()\\naparse_result()\\nastream()\\nastream_events()\\nbatch()\\nbatch_as_completed()\\nbind()\\nconfigurable_alternatives()\\nconfigurable_fields()\\nget_format_instructions()\\ninvoke()\\nparse()\\nparse_result()\\nparse_with_prompt()\\nstream()\\nwith_alisteners()\\nwith_config()\\nwith_fallbacks()\\nwith_listeners()\\nwith_retry()\\nwith_types()\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n    \\n      ¬© Copyright 2023, LangChain Inc.\\n      \\n\\n\\n\\n\\n\\n\\n',\n",
       " '\\n\\n\\n\\n\\nHow to do \"self-querying\" retrieval | \\uf8ffü¶úÔ∏è\\uf8ffüîó LangChain\\n\\n\\n\\n\\n\\n\\nSkip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1\\uf8ffüí¨SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval Augmented Generation (RAG) App: Part 2Build an Extraction ChainBuild an AgentTaggingBuild a Retrieval Augmented Generation (RAG) App: Part 1Build a semantic search engineBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow-to guidesHow to use tools in a chainHow to use a vectorstore as a retrieverHow to add memory to chatbotsHow to use example selectorsHow to add a semantic layer over graph databaseHow to invoke runnables in parallelHow to stream chat model responsesHow to add default invocation args to a RunnableHow to add retrieval to chatbotsHow to use few shot examples in chat modelsHow to do tool/function callingHow to install LangChain packagesHow to add examples to the prompt for query analysisHow to use few shot examplesHow to run custom functionsHow to use output parsers to parse an LLM response into structured formatHow to handle cases where no queries are generatedHow to route between sub-chainsHow to return structured data from a modelHow to summarize text through parallelizationHow to summarize text through iterative refinementHow to summarize text in a single LLM callHow to use toolkitsHow to add ad-hoc tool calling capability to LLMs and Chat ModelsBuild an Agent with AgentExecutor (Legacy)How to construct knowledge graphsHow to partially format prompt templatesHow to handle multiple queries when doing query analysisHow to use built-in tools and toolkitsHow to pass through arguments from one step to the nextHow to compose prompts togetherHow to handle multiple retrievers when doing query analysisHow to add values to a chain\\'s stateHow to construct filters for query analysisHow to configure runtime chain internalsHow deal with high cardinality categoricals when doing query analysisCustom Document LoaderHow to use the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callbacks in async environmentsHow to attach callbacks to a runnableHow to propagate callbacks  constructorHow to dispatch custom callback eventsHow to pass callbacks in at runtimeHow to split by characterHow to cache chat model responsesHow to handle rate limitsHow to init any model in one lineHow to track token usage in ChatModelsHow to add tools to chatbotsHow to split codeHow to do retrieval with contextual compressionHow to convert Runnables to ToolsHow to create custom callback handlersHow to create a custom chat model classCustom EmbeddingsHow to create a custom LLM classCustom RetrieverHow to create toolsHow to debug your LLM appsHow to load CSVsHow to load documents from a directoryHow to load HTMLHow to load JSONHow to load MarkdownHow to load Microsoft Office filesHow to load PDFsHow to load web pagesHow to create a dynamic (self-constructing) chainText embedding modelsHow to combine results from multiple retrieversHow to select examples from a LangSmith datasetHow to select examples by lengthHow to select examples by maximal marginal relevance (MMR)How to select examples by n-gram overlapHow to select examples by similarityHow to use reference examples when doing extractionHow to handle long text when doing extractionHow to use prompting alone (no tool calling) to do extractionHow to add fallbacks to a runnableHow to filter messagesHybrid SearchHow to use the LangChain indexing APIHow to inspect runnablesLangChain Expression Language CheatsheetHow to cache LLM responsesHow to track token usage for LLMsRun models locallyHow to get log probabilitiesHow to reorder retrieved results to mitigate the \"lost in the middle\" effectHow to split Markdown by HeadersHow to merge consecutive messages of the same typeHow to add message historyHow to migrate from legacy LangChain agents to LangGraphHow to retrieve using multiple vectors per documentHow to pass multimodal data directly to modelsHow to use multimodal promptsHow to create a custom Output ParserHow to use the output-fixing parserHow to parse JSON outputHow to retry when a parsing error occursHow to parse text from message objectsHow to parse XML outputHow to parse YAML outputHow to use the Parent Document RetrieverHow to use LangChain with different Pydantic versionsHow to add chat historyHow to get a RAG application to add citationsHow to do per-user retrievalHow to get your RAG application to return sourcesHow to stream results from your RAG applicationHow to split JSON dataHow to recursively split text by charactersResponse metadataHow to pass runtime secrets to runnablesHow to do \"self-querying\" retrievalHow to split text based on semantic similarityHow to chain runnablesHow to save and load LangChain objectsHow to split text by tokensHow to split HTMLHow to do question answering over CSVsHow to deal with large databases when doing SQL question-answeringHow to better prompt when doing SQL question-answeringHow to do query validation as part of SQL question-answeringHow to stream runnablesHow to stream responses from an LLMHow to use a time-weighted vector store retrieverHow to return artifacts from a toolHow to use chat models to call toolsHow to disable parallel tool callingHow to force models to call a toolHow to access the RunnableConfig from a toolHow to pass tool outputs to chat modelsHow to pass run time values to toolsHow to stream events from a toolHow to stream tool callsHow to convert tools to OpenAI FunctionsHow to handle tool errorsHow to use few-shot prompting with tool callingHow to add a human-in-the-loop for toolsHow to bind model-specific toolsHow to trim messagesHow to create and query vector storesConceptual guideAgentsArchitectureAsync programming with langchainCallbacksChat historyChat modelsDocument loadersEmbedding modelsEvaluationExample selectorsFew-shot promptingConceptual guideKey-value storesLangChain Expression Language (LCEL)MessagesMultimodalityOutput parsersPrompt TemplatesRetrieval augmented generation (RAG)RetrievalRetrieversRunnable interfaceStreamingStructured outputsTestingString-in, string-out llmsText splittersTokensTool callingToolsTracingVector storesWhy LangChain?Ecosystem\\uf8ffü¶ú\\uf8ffüõ†Ô∏è LangSmith\\uf8ffü¶ú\\uf8ffüï∏Ô∏è LangGraphVersionsv0.3v0.2Pydantic compatibilityMigrating from v0.0 chainsHow to migrate from v0.0 chainsMigrating from ConstitutionalChainMigrating from ConversationalChainMigrating from ConversationalRetrievalChainMigrating from LLMChainMigrating from LLMMathChainMigrating from LLMRouterChainMigrating from MapReduceDocumentsChainMigrating from MapRerankDocumentsChainMigrating from MultiPromptChainMigrating from RefineDocumentsChainMigrating from RetrievalQAMigrating from StuffDocumentsChainUpgrading to LangGraph memoryHow to migrate to LangGraph memoryHow to use BaseChatMessageHistory with LangGraphMigrating off ConversationBufferMemory or ConversationStringBufferMemoryMigrating off ConversationBufferWindowMemory or ConversationTokenBufferMemoryMigrating off ConversationSummaryMemory or ConversationSummaryBufferMemoryA Long-Term Memory AgentRelease policySecurity PolicyHow-to guidesHow to do \"self-querying\" retrievalOn this pageHow to do \"self-querying\" retrieval\\ninfoHead to Integrations for documentation on vector stores with built-in support for self-querying.\\nA self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying vector store. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.\\n\\nGet started‚Äã\\nFor demonstration purposes we\\'ll use a Chroma vector store. We\\'ve created a small demo set of documents that contain summaries of movies.\\nNote: The self-query retriever requires you to have lark package installed.\\n%pip install --upgrade --quiet  lark langchain-chroma\\nfrom langchain_chroma import Chromafrom langchain_core.documents import Documentfrom langchain_openai import OpenAIEmbeddingsdocs = [    Document(        page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",        metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},    ),    Document(        page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",        metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},    ),    Document(        page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",        metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},    ),    Document(        page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",        metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},    ),    Document(        page_content=\"Toys come alive and have a blast doing so\",        metadata={\"year\": 1995, \"genre\": \"animated\"},    ),    Document(        page_content=\"Three men walk into the Zone, three men walk out of the Zone\",        metadata={            \"year\": 1979,            \"director\": \"Andrei Tarkovsky\",            \"genre\": \"thriller\",            \"rating\": 9.9,        },    ),]vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())API Reference:Document | OpenAIEmbeddings\\nCreating our self-querying retriever‚Äã\\nNow we can instantiate our retriever. To do this we\\'ll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents.\\nfrom langchain.chains.query_constructor.schema import AttributeInfofrom langchain.retrievers.self_query.base import SelfQueryRetrieverfrom langchain_openai import ChatOpenAImetadata_field_info = [    AttributeInfo(        name=\"genre\",        description=\"The genre of the movie. One of [\\'science fiction\\', \\'comedy\\', \\'drama\\', \\'thriller\\', \\'romance\\', \\'action\\', \\'animated\\']\",        type=\"string\",    ),    AttributeInfo(        name=\"year\",        description=\"The year the movie was released\",        type=\"integer\",    ),    AttributeInfo(        name=\"director\",        description=\"The name of the movie director\",        type=\"string\",    ),    AttributeInfo(        name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"    ),]document_content_description = \"Brief summary of a movie\"llm = ChatOpenAI(temperature=0)retriever = SelfQueryRetriever.from_llm(    llm,    vectorstore,    document_content_description,    metadata_field_info,)API Reference:AttributeInfo | SelfQueryRetriever | ChatOpenAI\\nTesting it out‚Äã\\nAnd now we can actually try using our retriever!\\n# This example only specifies a filterretriever.invoke(\"I want to watch a movie rated higher than 8.5\")\\n[Document(page_content=\\'Three men walk into the Zone, three men walk out of the Zone\\', metadata={\\'director\\': \\'Andrei Tarkovsky\\', \\'genre\\': \\'thriller\\', \\'rating\\': 9.9, \\'year\\': 1979}), Document(page_content=\\'A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\\', metadata={\\'director\\': \\'Satoshi Kon\\', \\'rating\\': 8.6, \\'year\\': 2006})]\\n# This example specifies a query and a filterretriever.invoke(\"Has Greta Gerwig directed any movies about women\")\\n[Document(page_content=\\'A bunch of normal-sized women are supremely wholesome and some men pine after them\\', metadata={\\'director\\': \\'Greta Gerwig\\', \\'rating\\': 8.3, \\'year\\': 2019})]\\n# This example specifies a composite filterretriever.invoke(\"What\\'s a highly rated (above 8.5) science fiction film?\")\\n[Document(page_content=\\'A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\\', metadata={\\'director\\': \\'Satoshi Kon\\', \\'rating\\': 8.6, \\'year\\': 2006}), Document(page_content=\\'Three men walk into the Zone, three men walk out of the Zone\\', metadata={\\'director\\': \\'Andrei Tarkovsky\\', \\'genre\\': \\'thriller\\', \\'rating\\': 9.9, \\'year\\': 1979})]\\n# This example specifies a query and composite filterretriever.invoke(    \"What\\'s a movie after 1990 but before 2005 that\\'s all about toys, and preferably is animated\")\\n[Document(page_content=\\'Toys come alive and have a blast doing so\\', metadata={\\'genre\\': \\'animated\\', \\'year\\': 1995})]\\nFilter k‚Äã\\nWe can also use the self query retriever to specify k: the number of documents to fetch.\\nWe can do this by passing enable_limit=True to the constructor.\\nretriever = SelfQueryRetriever.from_llm(    llm,    vectorstore,    document_content_description,    metadata_field_info,    enable_limit=True,)# This example only specifies a relevant queryretriever.invoke(\"What are two movies about dinosaurs\")\\n[Document(page_content=\\'A bunch of scientists bring back dinosaurs and mayhem breaks loose\\', metadata={\\'genre\\': \\'science fiction\\', \\'rating\\': 7.7, \\'year\\': 1993}), Document(page_content=\\'Toys come alive and have a blast doing so\\', metadata={\\'genre\\': \\'animated\\', \\'year\\': 1995})]\\nConstructing from scratch with LCEL‚Äã\\nTo see what\\'s going on under the hood, and to have more custom control, we can reconstruct our retriever from scratch.\\nFirst, we need to create a query-construction chain. This chain will take a user query and generated a StructuredQuery object which captures the filters specified by the user. We provide some helper functions for creating a prompt and output parser. These have a number of tunable params that we\\'ll ignore here for simplicity.\\nfrom langchain.chains.query_constructor.base import (    StructuredQueryOutputParser,    get_query_constructor_prompt,)prompt = get_query_constructor_prompt(    document_content_description,    metadata_field_info,)output_parser = StructuredQueryOutputParser.from_components()query_constructor = prompt | llm | output_parserAPI Reference:StructuredQueryOutputParser | get_query_constructor_prompt\\nLet\\'s look at our prompt:\\nprint(prompt.format(query=\"dummy question\"))\\nYour goal is to structure the user\\'s query to match the request schema provided below.<< Structured Request Schema >>When responding use a markdown code snippet with a JSON object formatted in the following schema:\\\\`\\\\`\\\\`json{    \"query\": string \\\\ text string to compare to document contents    \"filter\": string \\\\ logical condition statement for filtering documents}\\\\`\\\\`\\\\`The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as well.A logical condition statement is composed of one or more comparison and logical operation statements.A comparison statement takes the form: `comp(attr, val)`:- `comp` (eq | ne | gt | gte | lt | lte | contain | like | in | nin): comparator- `attr` (string):  name of attribute to apply the comparison to- `val` (string): is the comparison valueA logical operation statement takes the form `op(statement1, statement2, ...)`:- `op` (and | or | not): logical operator- `statement1`, `statement2`, ... (comparison statements or logical operation statements): one or more statements to apply the operation toMake sure that you only use the comparators and logical operators listed above and no others.Make sure that filters only refer to attributes that exist in the data source.Make sure that filters only use the attributed names with its function names if there are functions applied on them.Make sure that filters only use format `YYYY-MM-DD` when handling date data typed values.Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.Make sure that filters are only used as needed. If there are no filters that should be applied return \"NO_FILTER\" for the filter value.<< Example 1. >>Data Source:\\\\`\\\\`\\\\`json{    \"content\": \"Lyrics of a song\",    \"attributes\": {        \"artist\": {            \"type\": \"string\",            \"description\": \"Name of the song artist\"        },        \"length\": {            \"type\": \"integer\",            \"description\": \"Length of the song in seconds\"        },        \"genre\": {            \"type\": \"string\",            \"description\": \"The song genre, one of \"pop\", \"rock\" or \"rap\"\"        }    }}\\\\`\\\\`\\\\`User Query:What are songs by Taylor Swift or Katy Perry about teenage romance under 3 minutes long in the dance pop genreStructured Request:\\\\`\\\\`\\\\`json{    \"query\": \"teenager love\",    \"filter\": \"and(or(eq(\\\\\"artist\\\\\", \\\\\"Taylor Swift\\\\\"), eq(\\\\\"artist\\\\\", \\\\\"Katy Perry\\\\\")), lt(\\\\\"length\\\\\", 180), eq(\\\\\"genre\\\\\", \\\\\"pop\\\\\"))\"}\\\\`\\\\`\\\\`<< Example 2. >>Data Source:\\\\`\\\\`\\\\`json{    \"content\": \"Lyrics of a song\",    \"attributes\": {        \"artist\": {            \"type\": \"string\",            \"description\": \"Name of the song artist\"        },        \"length\": {            \"type\": \"integer\",            \"description\": \"Length of the song in seconds\"        },        \"genre\": {            \"type\": \"string\",            \"description\": \"The song genre, one of \"pop\", \"rock\" or \"rap\"\"        }    }}\\\\`\\\\`\\\\`User Query:What are songs that were not published on SpotifyStructured Request:\\\\`\\\\`\\\\`json{    \"query\": \"\",    \"filter\": \"NO_FILTER\"}\\\\`\\\\`\\\\`<< Example 3. >>Data Source:\\\\`\\\\`\\\\`json{    \"content\": \"Brief summary of a movie\",    \"attributes\": {    \"genre\": {        \"description\": \"The genre of the movie. One of [\\'science fiction\\', \\'comedy\\', \\'drama\\', \\'thriller\\', \\'romance\\', \\'action\\', \\'animated\\']\",        \"type\": \"string\"    },    \"year\": {        \"description\": \"The year the movie was released\",        \"type\": \"integer\"    },    \"director\": {        \"description\": \"The name of the movie director\",        \"type\": \"string\"    },    \"rating\": {        \"description\": \"A 1-10 rating for the movie\",        \"type\": \"float\"    }}}\\\\`\\\\`\\\\`User Query:dummy questionStructured Request:\\nAnd what our full chain produces:\\nquery_constructor.invoke(    {        \"query\": \"What are some sci-fi movies from the 90\\'s directed by Luc Besson about taxi drivers\"    })\\nStructuredQuery(query=\\'taxi driver\\', filter=Operation(operator=<Operator.AND: \\'and\\'>, arguments=[Comparison(comparator=<Comparator.EQ: \\'eq\\'>, attribute=\\'genre\\', value=\\'science fiction\\'), Operation(operator=<Operator.AND: \\'and\\'>, arguments=[Comparison(comparator=<Comparator.GTE: \\'gte\\'>, attribute=\\'year\\', value=1990), Comparison(comparator=<Comparator.LT: \\'lt\\'>, attribute=\\'year\\', value=2000)]), Comparison(comparator=<Comparator.EQ: \\'eq\\'>, attribute=\\'director\\', value=\\'Luc Besson\\')]), limit=None)\\nThe query constructor is the key element of the self-query retriever. To make a great retrieval system you\\'ll need to make sure your query constructor works well. Often this requires adjusting the prompt, the examples in the prompt, the attribute descriptions, etc. For an example that walks through refining a query constructor on some hotel inventory data, check out this cookbook.\\nThe next key element is the structured query translator. This is the object responsible for translating the generic StructuredQuery object into a metadata filter in the syntax of the vector store you\\'re using. LangChain comes with a number of built-in translators. To see them all head to the Integrations section.\\nfrom langchain_community.query_constructors.chroma import ChromaTranslatorretriever = SelfQueryRetriever(    query_constructor=query_constructor,    vectorstore=vectorstore,    structured_query_translator=ChromaTranslator(),)API Reference:ChromaTranslator\\nretriever.invoke(    \"What\\'s a movie after 1990 but before 2005 that\\'s all about toys, and preferably is animated\")\\n[Document(page_content=\\'Toys come alive and have a blast doing so\\', metadata={\\'genre\\': \\'animated\\', \\'year\\': 1995})]Edit this pageWas this page helpful?PreviousHow to pass runtime secrets to runnablesNextHow to split text based on semantic similarityGet startedCreating our self-querying retrieverTesting it outFilter kConstructing from scratch with LCELCommunityTwitterGitHubOrganizationPythonJS/TSMoreHomepageBlogYouTubeCopyright ¬© 2024 LangChain, Inc.\\n\\n']"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "leaf_texts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "fbd18b8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import FAISS\n",
    "\n",
    "all_texts = leaf_texts.copy()\n",
    "\n",
    "# 레벨을 정렬하여 순회\n",
    "for level in sorted(results.keys()):\n",
    "    # 현재 레벨의 DataFrame에서 요약을 추출\n",
    "    summaries = results[level][1][\"summaries\"].tolist()\n",
    "    # 현재 레벨의 요약을 all_texts에 추가합니다.\n",
    "    all_texts.extend(summaries)\n",
    "\n",
    "# 이제 all_texts를 사용하여 FAISS vectorstore를 구축합니다.\n",
    "vectorstore = FAISS.from_texts(texts=all_texts, embedding=embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7eea66d",
   "metadata": {},
   "source": [
    "DB 를 로컬에 저장합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "15f67a9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "DB_INDEX = \"RAPTOR\"\n",
    "\n",
    "# 기존 DB 인덱스가 존재하면 로드하여 vectorstore와 병합한 후 저장합니다.\n",
    "if os.path.exists(DB_INDEX):\n",
    "    local_index = FAISS.load_local(DB_INDEX, embeddings)\n",
    "    local_index.merge_from(vectorstore)\n",
    "    local_index.save_local(DB_INDEX)\n",
    "else:\n",
    "    vectorstore.save_local(folder_path=DB_INDEX)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ad41f05",
   "metadata": {},
   "source": [
    "`vectorstore` 로부터 `retriever`를 생성합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "398b4dab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# retriever 생성\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6da0f7e",
   "metadata": {},
   "source": [
    "## RAG 체인 정의\n",
    "\n",
    "이제 생성된 vectorstore를 이용해 RAG 체인을 정의하고 실행하여 결과를 확인합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "a9c26dc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import hub\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "\n",
    "# 프롬프트 정의\n",
    "prompt = PromptTemplate.from_template(\n",
    "    \"\"\"\n",
    "    You are an Code copilot assistant specializing in Code based Question-Answering (QA) tasks within a Retrieval-Augmented Generation (RAG) system. \n",
    "You are given LangChain documentation. Your primary mission is to answer questions based on provided context.\n",
    "Ensure your response is concise and directly addresses the question without any additional narration.\n",
    "\n",
    "###\n",
    "\n",
    "Your final answer should be written concisely (but include important numerical values, technical terms, jargon, and names).\n",
    "\n",
    "# Steps\n",
    "\n",
    "1. Carefully read and understand the context provided.\n",
    "2. Identify the key information related to the question within the context.\n",
    "3. Formulate a concise answer based on the relevant information.\n",
    "4. Ensure your final answer directly addresses the question.\n",
    "5. Be sure to include full example code if the question is about code.\n",
    "\n",
    "# Output Format:\n",
    "[General introduction of the answer]\n",
    "[Comprehensive answer to the question including code example]\n",
    "\n",
    "###\n",
    "\n",
    "Remember:\n",
    "- It's crucial to base your answer solely on the **PROVIDED CONTEXT**. \n",
    "- DO NOT use any external knowledge or information not present in the given materials.\n",
    "\n",
    "###\n",
    "\n",
    "# Here is the user's QUESTION that you should answer:\n",
    "{question}\n",
    "\n",
    "# Here is the CONTEXT that you should use to answer the question:\n",
    "{context}\n",
    "\n",
    "[Note]\n",
    "- Answer should be written in Korean.\n",
    "\n",
    "# Your final ANSWER to the user's QUESTION:\"\"\"\n",
    ")\n",
    "\n",
    "\n",
    "# 문서 포맷팅\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(f\"<document>{doc.page_content}</document>\" for doc in docs)\n",
    "\n",
    "\n",
    "# RAG 체인 정의\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52da8b08",
   "metadata": {},
   "source": [
    "[LangSmith 링크](https://smith.langchain.com/public/3e459bd4-4265-4c1d-b43d-279a1204d983/r)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "e0efdda7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LangChain Expression Language (LCEL)의 핵심 주제는 LangChain 생태계 내에서 체인 또는 \"Runnable\"의 실행을 구성하고 최적화하는 선언적 프레임워크입니다. LCEL은 사용자가 체인에서 무엇을 해야 하는지를 정의할 수 있게 하여 LangChain이 실행 시간을 최적화할 수 있도록 합니다. 이는 병렬 처리나 비동기 실행이 필요한 작업에 특히 유용하며, RunnableParallel 및 Runnable Async API를 통해 이를 지원합니다. LCEL은 병렬 실행, 스트리밍, 관찰 가능성, 표준 API를 제공하며, 간단한 체인 구성 및 복잡한 체인 내 개별 노드에서 사용될 수 있습니다."
     ]
    }
   ],
   "source": [
    "# 추상적인 질문 실행\n",
    "answer = rag_chain.stream(\"전체 문서의 핵심 주제에 대해 설명해주세요.\")\n",
    "stream_response(answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea6296b2",
   "metadata": {},
   "source": [
    "[LangSmith 링크](https://smith.langchain.com/public/c29887c7-f005-450a-a747-7d932c753721/r)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "id": "6e8193fb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "self-querying 방법은 자연어 쿼리를 받아 구조화된 쿼리를 작성하고 이를 벡터 저장소에 적용하여 문서의 메타데이터를 필터링하는 기능을 제공합니다. 아래는 self-querying을 구현하는 예시 코드입니다.\n",
      "\n",
      "```python\n",
      "# 필요한 패키지 설치\n",
      "%pip install --upgrade --quiet lark langchain-chroma\n",
      "\n",
      "from langchain_chroma import Chroma\n",
      "from langchain_core.documents import Document\n",
      "from langchain_openai import OpenAIEmbeddings\n",
      "from langchain.chains.query_constructor.schema import AttributeInfo\n",
      "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
      "from langchain_openai import ChatOpenAI\n",
      "\n",
      "# 문서 데이터 생성\n",
      "docs = [\n",
      "    Document(\n",
      "        page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
      "        metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},\n",
      "    ),\n",
      "    Document(\n",
      "        page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
      "        metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
      "    ),\n",
      "    # 추가 문서들...\n",
      "]\n",
      "\n",
      "# 벡터 저장소 생성\n",
      "vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())\n",
      "\n",
      "# 메타데이터 필드 정보 설정\n",
      "metadata_field_info = [\n",
      "    AttributeInfo(\n",
      "        name=\"genre\",\n",
      "        description=\"The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']\",\n",
      "        type=\"string\",\n",
      "    ),\n",
      "    AttributeInfo(\n",
      "        name=\"year\",\n",
      "        description=\"The year the movie was released\",\n",
      "        type=\"integer\",\n",
      "    ),\n",
      "    AttributeInfo(\n",
      "        name=\"director\",\n",
      "        description=\"The name of the movie director\",\n",
      "        type=\"string\",\n",
      "    ),\n",
      "    AttributeInfo(\n",
      "        name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
      "    ),\n",
      "]\n",
      "\n",
      "# 문서 내용 설명\n",
      "document_content_description = \"Brief summary of a movie\"\n",
      "\n",
      "# LLM 설정\n",
      "llm = ChatOpenAI(temperature=0)\n",
      "\n",
      "# SelfQueryRetriever 생성\n",
      "retriever = SelfQueryRetriever.from_llm(\n",
      "    llm,\n",
      "    vectorstore,\n",
      "    document_content_description,\n",
      "    metadata_field_info,\n",
      ")\n",
      "\n",
      "# 쿼리 실행 예시\n",
      "print(retriever.invoke(\"I want to watch a movie rated higher than 8.5\"))\n",
      "print(retriever.invoke(\"Has Greta Gerwig directed any movies about women\"))\n",
      "```\n",
      "\n",
      "이 코드는 self-querying retriever를 설정하고, 사용자가 입력한 자연어 쿼리를 기반으로 문서의 메타데이터를 필터링하여 관련 문서를 검색하는 방법을 보여줍니다."
     ]
    }
   ],
   "source": [
    "# Low Level 질문 실행\n",
    "answer = rag_chain.stream(\"self-querying 방법과 예시 코드를 작성해 주세요.\")\n",
    "stream_response(answer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "52666307",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PydanticOutputParser를 활용한 예시 코드는 다음과 같습니다. 이 코드는 Pydantic 모델을 사용하여 LLM의 출력을 구조화된 데이터로 파싱하는 방법을 보여줍니다.\n",
      "\n",
      "```python\n",
      "from pydantic import BaseModel\n",
      "from langchain_core.output_parsers import PydanticOutputParser\n",
      "\n",
      "# Pydantic 모델 정의\n",
      "class MyModel(BaseModel):\n",
      "    name: str\n",
      "    age: int\n",
      "\n",
      "# PydanticOutputParser 인스턴스 생성\n",
      "parser = PydanticOutputParser(pydantic_object=MyModel)\n",
      "\n",
      "# LLM의 출력 예시\n",
      "llm_output = '{\"name\": \"John Doe\", \"age\": 30}'\n",
      "\n",
      "# 출력 파싱\n",
      "parsed_output = parser.parse(llm_output)\n",
      "\n",
      "print(parsed_output)\n",
      "```\n",
      "\n",
      "이 코드는 `MyModel`이라는 Pydantic 모델을 정의하고, `PydanticOutputParser`를 사용하여 LLM의 JSON 형식 출력을 `MyModel` 객체로 파싱합니다."
     ]
    }
   ],
   "source": [
    "# Low Level 질문 실행\n",
    "answer = rag_chain.stream(\"PydanticOutputParser 을 활용한 예시 코드를 작성해 주세요.\")\n",
    "stream_response(answer)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "py-test",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
